Disclosure of Invention
The embodiment of the application provides a training method of a traffic signal lamp identification model, a traffic signal lamp identification method, a training device of the traffic signal lamp identification model, a traffic signal lamp identification method device, electronic equipment, a computer readable storage medium and a computer program product, which are used for solving the technical problem that a neural network model in the prior art cannot accurately identify a traffic signal lamp.
According to a first aspect of an embodiment of the present application, there is provided a training method of a traffic signal lamp recognition model, the method including:
Acquiring a training data set comprising a plurality of training samples, wherein each training sample comprises at least two sample images, and the sample images are marked with the actual positions of bounding boxes of imaging contents of traffic lights;
Cutting a plurality of sample images, scaling and arranging the cut sample images, and combining to obtain a combined image;
inputting the combined image into a first layer of sub-network of an initial neural network model to obtain an adaptive anchor frame of the combined image by the first layer of sub-network and outputting an image to be detected after adaptive scaling;
inputting an image to be detected into a second-layer sub-network of an initial neural network model to obtain a plurality of characteristic images which are output after the second-layer neural network respectively starts interlacing, separating and extracting pixels at a plurality of preset pixels of the image to be detected and convoluting a plurality of subgraphs obtained after the pixels are extracted;
Inputting the multiple feature images into a third layer of sub-network of the initial neural network model to obtain a fused feature image obtained after the third layer of sub-network performs feature fusion on the multiple feature images;
Inputting the fusion feature map and the image to be detected into a fourth-layer sub-network of the initial neural network model to obtain a target area mapped in the image to be detected by the fusion feature map matched by the fourth-layer sub-network, and adjusting the initial position of the bounding box according to the target area and then outputting the predicted position of the bounding box;
And calculating a loss function value of a preset neural network loss function according to the actual position and the predicted position, and performing iterative training on the initial neural network model according to the loss function value to obtain a traffic signal lamp identification model.
In one possible implementation, cropping the plurality of sample images includes:
for each sample image, acquiring the horizontal resolution and the vertical resolution of the sample image;
if any one of the horizontal resolution and the vertical resolution does not accord with the multiple of the preset numerical value, calculating the minimum value of the pixels to be cut in the horizontal direction or the vertical direction;
Clipping the sample image according to the minimum value, so that the horizontal resolution and the vertical resolution of the clipped sample image are both accordant with the multiple of the preset numerical value;
Wherein the minimum value is smaller than a preset value.
In one possible implementation manner, the zooming and arranging of the plurality of clipped sample images are combined to obtain a combined image, which includes:
acquiring a preset horizontal resolution and a preset vertical resolution, wherein the preset horizontal resolution and the preset vertical resolution are in accordance with the multiple of a preset numerical value;
Determining a first scaling ratio of each cut sample image in the horizontal direction and a second scaling ratio in the vertical direction based on the preset vertical resolution, the horizontal resolution and the vertical resolution of the cut sample image;
and carrying out random arrangement on the scaled sample images and then merging to obtain a merged image.
In one possible implementation manner, inputting the combined image to a first layer sub-network of the initial neural network model, obtaining an adaptive anchor frame of the combined image by the first layer sub-network and outputting an image to be measured after adaptive scaling, including:
Performing self-adaptive anchor frame operation on the combined image, marking the initial positions of all the bounded frames in the combined image, and obtaining a marked image marked with the initial positions of all the bounded frames;
And carrying out self-adaptive scaling on the marked image to obtain an image to be detected, wherein the horizontal resolution and the vertical resolution of the image to be detected are both in accordance with the multiple of a preset value.
In one possible implementation manner, inputting an image to be measured to a second layer sub-network of an initial neural network model, obtaining a plurality of feature images output after the second layer neural network starts to extract pixels in an interlaced and spaced manner at a plurality of preset pixels of the image to be measured and convolves a plurality of sub-images obtained after the pixels are extracted for a plurality of times, including:
determining a plurality of preset pixels in an image to be detected;
for any one preset pixel, starting to extract pixels from the preset pixel in an interlaced interval way, and obtaining a sub-image of the image to be detected;
And for any sub-graph, carrying out convolution processing on the sub-graph for multiple times to extract the characteristics of the sub-graph, and obtaining the characteristic graph corresponding to the sub-graph.
In one possible implementation manner, inputting the fusion feature map and the image to be detected into a fourth layer of sub-network of the initial neural network model to obtain a target area mapped in the image to be detected by the fusion feature map matched by the fourth layer of sub-network, and adjusting the initial position of the bounding box according to the target area to output the predicted position of the bounding box, including:
Extracting feature vectors of the fusion feature map, wherein the feature vectors comprise pixel value features, texture features, shape features and spatial relationship features;
And adjusting the initial position of the bounding box according to the target area to obtain the predicted position of the bounding box.
In one possible implementation, the actual position is represented by actual coordinates of all pixels on the bounding box, the predicted position is represented by predicted coordinates of all pixels on the bounding box, a loss function value of a preset neural network loss function is calculated according to the actual position and the predicted position, and an initial neural network model is iteratively trained according to the loss function value to obtain a traffic signal lamp recognition model, including:
inputting the actual coordinates of all pixels on the bounding box and the predicted coordinates of all pixels into a preset neural network loss function to obtain a loss function value;
If the loss function value does not meet the training ending condition of the neural network model, carrying out iterative training on the initial neural network model based on each training sample and the actual position of the bounding box marked in each sample image in each training sample until the training ending condition of the neural network model is met, and obtaining the traffic signal lamp identification model.
According to a second aspect of an embodiment of the present application, there is provided a method for identifying a traffic signal, the method including:
acquiring an image to be detected at a preset position, wherein the image to be detected comprises imaging content of a traffic signal lamp;
Inputting the image to be detected into the traffic light identification model to obtain the position of a bounding box, which is output by the traffic light identification model and is used for marking imaging content, in the image to be detected;
Identifying the color of the area in the bounded frame through a preset color model, taking the color of the area in the bounded frame as the color of a traffic signal lamp, and determining the state of the traffic signal lamp according to the color of the traffic signal lamp;
The traffic light recognition model is trained by the method of the first aspect.
In one possible implementation, the status of the traffic signal includes allowing traffic, warning, and disabling, and determining the status of the traffic signal based on the color of the traffic signal includes:
If the color of the traffic signal lamp is green, determining that the state of the traffic signal lamp is running traffic;
If the color of the traffic signal lamp is yellow, determining the state of the traffic signal lamp as a warning;
And if the color of the traffic signal lamp is yellow, determining that the state of the traffic signal lamp is forbidden.
According to a third aspect of the embodiment of the application, a training device of a traffic signal lamp identification model is provided, and the device comprises an acquisition module, a training module and a training module, wherein the acquisition module is used for acquiring a training data set comprising a plurality of training samples, each training sample comprises at least two sample images, and the sample images are marked with the actual positions of bounding boxes of imaging contents of the traffic signal lamp;
the system comprises a cutting, zooming, arranging and merging module, a merging module and a processing module, wherein the cutting, zooming, arranging and merging module is used for cutting a plurality of sample images, zooming and arranging the plurality of cut sample images and merging to obtain a merged image;
The image obtaining module to be tested is used for inputting the combined image to a first layer of sub-network of the initial neural network model to obtain a self-adaptive anchor frame of the combined image by the first layer of sub-network and outputting an image to be tested after self-adaptive scaling;
The characteristic diagram obtaining module is used for inputting the image to be detected into a second layer of sub-network of the initial neural network model to obtain a plurality of characteristic diagrams which are output after the second layer of neural network starts to extract pixels in an interlaced interval way at a plurality of preset pixels of the image to be detected and carries out convolution processing on a plurality of sub-diagrams obtained after the pixels are extracted;
The fusion feature map obtaining module is used for inputting the multiple feature maps to a third layer of sub-network of the initial neural network model to obtain a fusion feature map obtained after the third layer of sub-network performs feature fusion on the multiple feature maps;
The prediction position output module is used for inputting the fusion feature image and the image to be detected into a fourth layer of sub-network of the initial neural network model, obtaining a target area mapped in the image to be detected by the fusion feature image matched by the fourth layer of sub-network, and adjusting the initial position of the bounding box according to the target area and then outputting the prediction position of the bounding box;
And the iteration module is used for calculating a loss function value of a preset neural network loss function according to the actual position and the predicted position, and carrying out iteration training on the initial neural network model according to the loss function value to obtain the traffic signal lamp identification model.
According to a fourth aspect of an embodiment of the present application, there is provided an identification device for a traffic signal, the device including:
the image acquisition module to be measured is used for acquiring an image to be measured at a preset position; the image to be measured comprises imaging content of a traffic signal lamp;
The limiting frame position recognition module is used for inputting the image to be detected into the traffic signal lamp recognition model to obtain the position of the limiting frame in the image to be detected, which is output by the traffic signal lamp recognition model and used for marking the imaging content;
the state identification module is used for identifying the color of the area in the bounded frame through a preset color model, taking the color of the area in the bounded frame as the color of the traffic signal lamp, and determining the state of the traffic signal lamp according to the color of the traffic signal lamp;
traffic light recognition model trained according to the method of any one of claims 1 to 7.
According to a fifth aspect of an embodiment of the present application there is provided an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method as provided in the first and second aspects when the program is executed.
According to a sixth aspect of embodiments of the present application, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method as provided by the first and second aspects.
According to a seventh aspect of embodiments of the present application, there is provided a computer program product comprising computer instructions stored in a computer readable storage medium, which when read from the computer readable storage medium by a processor of a computer device, cause the computer device to perform the steps of the method as provided in the first and second aspects.
The traffic signal lamp identification model trained by the embodiment of the application can identify the bounded frames of the imaging content of the traffic signal lamp in the image, remove the interference factors in the image, accurately identify the traffic signal lamp and improve the accuracy of identifying the traffic signal lamp.
Detailed Description
Embodiments of the present application are described below with reference to the drawings in the present application. It should be understood that the embodiments described below with reference to the drawings are exemplary descriptions for explaining the technical solutions of the embodiments of the present application, and the technical solutions of the embodiments of the present application are not limited.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and "comprising," when used in this specification, specify the presence of stated features, information, data, steps, operations, elements, and/or components, but do not preclude the presence or addition of other features, information, data, steps, operations, elements, components, and/or groups thereof, all of which may be included in the present specification. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein indicates that at least one of the items defined by the term, e.g., "a and/or B" may be implemented as "a", or as "B", or as "a and B".
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.
The target vehicle A is currently in an automatic driving mode, a traffic signal lamp is displayed at the front 500 in a navigation chart, at the moment, a camera in the target vehicle A can acquire a video stream containing imaging content of the traffic signal lamp at the front, the video stream is decoded to obtain an image to be detected, and the color of the traffic signal lamp is detected in the following three modes.
The first method is an RGB color model method, an image to be measured containing imaging content of a traffic signal lamp is input into an RGB color model, and the color of the traffic signal lamp in the image to be measured is identified, but the image to be measured contains a plurality of background colors, such as the color of an LED lamp on a surrounding building, and the background colors can greatly cause inaccurate identification of the color of the traffic signal lamp.
The second method is a method of an HSI color model, an image to be detected containing imaging content of a traffic signal lamp is input into the HSI color model, the color of the traffic signal lamp in the image to be detected is identified, and the HSI color space has the characteristics of constant illumination, good robustness and the like, but still cannot remove the influence of background color.
The application provides a training method of a traffic light identification model, a traffic light identification method, a training device of the traffic light identification model, a traffic light identification method device, electronic equipment, a computer readable storage medium and a computer program product, and aims to solve the technical problems in the prior art.
The technical solutions of the embodiments of the present application and technical effects produced by the technical solutions of the present application are described below by describing several exemplary embodiments. It should be noted that the following embodiments may be referred to, or combined with each other, and the description will not be repeated for the same terms, similar features, similar implementation steps, and the like in different embodiments.
The embodiment of the application provides a method for training a traffic signal lamp identification model, which is shown in fig. 1 and comprises the following steps:
Step S101, a training data set comprising a plurality of training samples is obtained, wherein each training sample comprises at least two sample images, and the actual positions of bounding boxes of imaging contents of traffic lights are marked on the sample images.
The embodiment of the application comprises a plurality of sample images in each training sample, namely, the plurality of sample images can be processed at one time in the model training stage.
In practical application, the traffic signal lamp has various shapes, such as a round lamp, a left arrow lamp or a right arrow lamp, etc., and the imaging content of the traffic signal lamp in the image to be measured in the embodiment of the application can be any shape.
During the driving of the vehicle, a video stream containing the imaging content of the traffic light in front of the vehicle may be photographed in real time by an image capturing device (e.g., a camera) on the vehicle, and then the video stream may be decoded based on FFMPEG (Fast Forward Mpeg, multimedia video processing tool) or OpenCV (cross-platform computer vision and machine learning software library issued based on BSD license) to extract images of each frame containing the imaging content of the traffic light.
The practical position of the bounding box marked with the imaging content of the traffic signal lamp in the sample image is the training label.
Step S102, cutting a plurality of sample images, scaling and arranging the cut sample images, and combining to obtain a combined image.
After a plurality of sample images are obtained, the sample images are cut, so that the cut sample images meet the requirements, and the detailed cutting process is performed in the follow-up part.
In the magic embodiment, after a plurality of sample images are cut, the plurality of cut sample images are scaled and arranged and then combined to obtain a combined image, and the plurality of images are combined to enrich detection targets in the images and improve the robustness of image processing.
Step S103, inputting the combined image into a first layer of sub-network of the initial neural network model, obtaining an adaptive anchor frame of the combined image by the first layer of sub-network and outputting an image to be detected after adaptive scaling, wherein the initial positions of all bounding boxes are marked in the image to be detected.
After the merged image is obtained, the merged image is input into a first layer of sub-network of an initial neural network model, the first layer of sub-network carries out self-adaptive anchor frame on the merged image, the self-adaptive anchor frame is used for determining the initial position of a bounding box of the traffic signal lamp, the initial position is the position where the bounding box possibly appears, and the marked image marked with the initial positions of all bounding boxes is obtained after the self-adaptive anchor frame operation is carried out on the merged image.
The method comprises the steps of obtaining a marked image, carrying out self-adaptive scaling on the marked image, scaling the horizontal resolution and the vertical resolution of the marked image, and obtaining the image to be measured after self-adaptive scaling.
Step S104, inputting the image to be detected into a second layer sub-network of the initial neural network model, and obtaining a plurality of feature images which are output after the second layer neural network respectively starts interlacing and separating extraction pixels at a plurality of preset pixels of the image to be detected and carries out convolution processing on a plurality of sub-images obtained after the extraction pixels.
After the image to be detected output by the first layer of sub-network is obtained, the image to be detected is input into the second layer of sub-network, after the second layer of sub-network receives the image to be detected, interlaced interval extraction pixels are respectively started at a plurality of preset pixels of the image to be detected, the preset pixels can be 4 pixels of the overlapping part of the first two rows and the first two columns, each preset pixel is used as a starting pixel, the interlaced interval extraction pixels are used for obtaining a sub-image, and 4 preset pixels are used for extracting 4 sub-images.
As shown in fig. 2, a schematic process diagram of a plurality of sub-images obtained after the image to be tested starts to extract pixels at preset pixels of the image to be tested, and assuming that the image to be tested is 4*4, 4 pixels overlapping the first two rows and the first two columns start to extract pixels at intervals respectively, so as to obtain four sub-images of 2 x 2.
After the image to be detected is subjected to interlaced column extraction pixel processing, a plurality of sub-images of the image to be detected are obtained, and the sub-images can be completely combined to form the image to be detected without any information loss.
After a plurality of subgraphs are obtained, convolution processing is respectively carried out on each subgraph to extract image features, and repeated convolution processing can be carried out, so that the extracted image features are relatively stable, and the extracted image features comprise pixel value features, texture features, shape features, spatial relationship features and the like of each pixel.
Step S105, inputting the feature images into a third layer of sub-network of the initial neural network model to obtain a fused feature image obtained after the third layer of sub-network performs feature fusion on the feature images.
The method comprises the steps of respectively carrying out convolution processing on a plurality of sub-images to extract image features, and obtaining various features of the sub-images completely, wherein the features in the feature images of each sub-image are the features of the image to be detected, the plurality of feature images have the same part and different parts, the plurality of feature images are required to be fused, and the same features are fused to prevent excessive resources.
And S106, inputting the fusion feature map and the image to be detected into a fourth-layer sub-network of the initial neural network model, obtaining a target area mapped by the fusion feature map matched by the fourth-layer sub-network in the image to be detected, and adjusting the initial position of the bounding box according to the target area and then outputting the predicted position of the bounding box.
After the fusion feature map is obtained, the fusion feature map and the image to be detected are input into a fourth-layer sub-network, the fourth-layer sub-network matches a target area mapped by the fusion feature map in the image to be detected according to the features in the fusion feature map, the target area is an area where imaging content is located, the initial position of the bounding box is adjusted according to the target area, namely the position of the bounding box is adjusted to the position where the target area is located, and the position is the predicted position of the bounding box output by the fourth-layer sub-network.
And step S107, calculating a loss function value of a preset neural network loss function according to the actual position and the predicted position, and performing iterative training on the initial neural network model according to the loss function value to obtain a traffic signal lamp identification model.
After determining the predicted position of the bounding box, the embodiment of the application calculates the loss function value of the preset neural network loss function according to the predicted position and the actual position, wherein the neural network loss function is defined as
Where i denotes the i-th pixel on the bounding box, m denotes the bounding box for a total of m pixels, (xθ, yθ) is a certain pixel of the predicted position of the bounding box, and (x 0, y 0) is a certain pixel at the actual position of the bounding box. According to the embodiment of the application, the value of the loss function is obtained after each operation, the neural network model is subjected to iterative training according to the loss function value, namely, the network parameters of the neural network model are updated to adjust the predicted position of the bounding box, when the value of the loss function is minimum, namely, the derivative infinite area 0 of the loss function, the neural network loss function is converged, and under the condition, the predicted position of the bounding box predicted is closest to the actual position of the bounding box, namely, the initial neural network model is subjected to iterative training, so that the traffic signal lamp recognition model is obtained.
The traffic signal lamp identification model trained by the embodiment of the application can identify the bounded frames of the imaging content of the traffic signal lamp in the image, remove the interference factors in the image, accurately identify the traffic signal lamp and improve the accuracy of identifying the traffic signal lamp.
The embodiment of the application provides a possible implementation manner, and the clipping of a plurality of sample images comprises the following steps:
for each sample image, acquiring the horizontal resolution and the vertical resolution of the sample image;
if any one of the horizontal resolution and the vertical resolution does not accord with the multiple of the preset numerical value, calculating the minimum value of the pixels to be cut in the horizontal direction or the vertical direction;
Clipping the sample image according to the minimum value, so that the horizontal resolution and the vertical resolution of the clipped sample image are both accordant with the multiple of the preset numerical value;
Wherein the minimum value is smaller than a preset value.
In the embodiment of the application, each sample image has a corresponding size, the image sizes are embodied in two aspects, namely, horizontal resolution and vertical resolution, the horizontal resolution represents the number of pixels in the horizontal direction, and the vertical resolution represents the number of pixels in the vertical direction.
It can be understood that, in order to facilitate the subsequent processing of the sample image, it is required to ensure that the size of the sample image meets the requirements. The preset value may be a multiple of 4, and if any one of the horizontal resolution and the vertical resolution does not conform to the multiple of the preset value, determining that the sample image needs to be cut, so that the horizontal resolution and the vertical resolution of the cut sample image conform to the multiple of the preset value.
It should be noted that, in the embodiment of the present application, the clipping of the sample image is fine clipping, when it is determined that any one of the horizontal resolution or the vertical resolution does not conform to the multiple of the preset value, the excessive clipping may cause clipping of the imaging content of the traffic signal lamp, so as to avoid excessive clipping, to calculate the minimum value of the pixels that should be clipped when clipping the horizontal resolution to conform to the multiple of the preset value, clipping the sample image according to the minimum value, so that both the horizontal resolution and the vertical resolution of the clipped sample image conform to the multiple of the preset value.
Specifically, assuming that the horizontal resolution of a certain sample image is 101, the vertical resolution is 102, and the preset value is 4, since 101 is at a remainder of 4 of 1, and 102 is at a remainder of 4 of 2, it can be determined that the minimum value of the pixels clipped in the horizontal direction is 2, and the minimum value of the pixels clipped in the vertical direction is 1.
The embodiment of the application provides a possible implementation manner, which comprises the steps of scaling and arranging a plurality of cut sample images and then combining the sample images to obtain a combined image, wherein the method comprises the following steps of:
acquiring a preset horizontal resolution and a preset vertical resolution, wherein the preset horizontal resolution and the preset vertical resolution are in accordance with the multiple of a preset numerical value;
Determining a first scaling ratio of each cut sample image in the horizontal direction and a second scaling ratio in the vertical direction based on the preset vertical resolution, the horizontal resolution and the vertical resolution of the cut sample image;
and carrying out random arrangement on the scaled sample images and then merging to obtain a merged image.
The sample image collected by the target vehicle in the driving process is usually an image in front of the target vehicle, the sample image contains fewer detection targets (the detection targets are imaging contents of traffic lights), in order to improve the detection accuracy, the number of the detection targets in the image input to the initial neural network model needs to be enriched, and the combined image containing a plurality of detection targets can be obtained by a mode of scaling and combining the plurality of sample images after arrangement.
The embodiment of the application sets a preset horizontal resolution and a preset vertical resolution on the scaled sample image, wherein the preset horizontal resolution and the preset vertical resolution are in accordance with the multiple of preset numerical values, calculates a first scaling ratio of each trimmed sample image in the horizontal direction and a second scaling ratio of each trimmed sample image in the vertical direction through the preset vertical resolution, the preset vertical resolution and the horizontal resolution of the trimmed sample image, scales the trimmed sample image according to the first scaling ratio and the second scaling ratio to obtain the scaled sample image, and randomly arranges and combines the scaled sample images to obtain a combined image.
The embodiment of the application provides a possible implementation manner, which inputs a combined image to a first layer of sub-network of an initial neural network model to obtain an image to be detected after the first layer of sub-network performs self-adaptive anchor frame and self-adaptive scaling on the combined image, and comprises the following steps:
Performing self-adaptive anchor frame operation on the combined image, marking the initial positions of all the bounded frames in the combined image, and obtaining a marked image marked with the initial positions of all the bounded frames;
And carrying out self-adaptive scaling on the marked image to obtain an image to be detected, wherein the horizontal resolution and the vertical resolution of the image to be detected are both in accordance with the multiple of a preset value.
After the merged image is obtained, the merged image is input into a first layer of sub-network of an initial neural network model, the first layer of sub-network carries out self-adaptive anchor frame on the merged image, the self-adaptive anchor frame is used for determining the initial position of a bounding box of the traffic signal lamp, the initial position is the position where the bounding box possibly appears, and the marked image marked with the initial positions of all bounding boxes is obtained after the self-adaptive anchor frame operation is carried out on the merged image. After the marked image is obtained, the image is also required to be subjected to self-adaptive scaling, the horizontal resolution and the vertical resolution of the marked image are scaled, the image to be measured is obtained after self-adaptive scaling, and the horizontal resolution and the vertical resolution of the image to be measured are both in accordance with the multiple of a preset numerical value.
The embodiment of the application provides a possible implementation manner, an image to be detected is input to a second layer of sub-network of an initial neural network model, a plurality of feature images are obtained by the second layer of neural network, the feature images are respectively obtained by starting interlacing and column-separating pixel extraction at a plurality of preset pixels of the image to be detected, and a plurality of sub-images obtained after the pixel extraction are subjected to convolution processing for a plurality of times, and the feature images comprise:
determining a plurality of preset pixels in an image to be detected;
for any one preset pixel, starting to extract pixels from the preset pixel in an interlaced interval way, and obtaining a sub-image of the image to be detected;
And for any sub-graph, carrying out convolution processing on the sub-graph for multiple times to extract the characteristics of the sub-graph, and obtaining the characteristic graph corresponding to the sub-graph.
The pixels in the embodiment of the application can be 4 pixels at the overlapping part of the first two rows and the first two columns, each preset pixel is taken as a starting pixel, the pixels are extracted at intervals in an interlaced way, one sub-image is obtained, and 4 preset pixels exist, so that 4 sub-images can be extracted.
After the multiple subgraphs are obtained, the embodiment of the application respectively carries out convolution processing on the multiple subgraphs to extract image characteristics, and can obtain various characteristics of the subgraphs, such as pixel value characteristics, texture characteristics, shape characteristics, spatial relationship characteristics and the like. Features in the feature images of each sub-image are also features of the image to be detected, the same part and different parts exist in the feature images, excessive resources are prevented from being occupied, the feature images are fused to obtain a fused feature image, and the fused feature image is the feature image of the image to be detected.
The embodiment of the application provides a possible implementation manner, which inputs a fusion feature image and an image to be detected into a fourth-layer sub-network of an initial neural network model to obtain a target area mapped in the image to be detected by the fusion feature image matched by the fourth-layer sub-network, adjusts the initial position of a bounding box according to the target area and then outputs the predicted position of the bounding box, and comprises the following steps:
Extracting feature vectors of the fusion feature map, wherein the feature vectors comprise pixel value features, texture features, shape features and spatial relationship features;
And adjusting the initial position of the bounding box according to the target area to obtain the predicted position of the bounding box.
After the fusion feature map is obtained, the embodiment of the application extracts the feature vector in the fusion feature map, wherein each parameter of the feature vector represents a feature, and the feature comprises a pixel value feature, a texture feature, a shape feature, a spatial relationship feature and the like.
After extracting the feature vector, determining a target area mapped by the feature vector in the image to be detected, wherein the target area is the area where the imaging content of the traffic signal lamp is located, and adjusting the initial position of the bounding box according to the target area to obtain the predicted position of the bounding box.
In addition, in determining the predicted position of the bounding box, there may be a case where there are a plurality of predicted positions, in which case, an optimal predicted position, that is, the predicted position of the bounding box to be output, is determined by a Non-maximum suppression algorithm (Non-Maximum Suppression, NMS).
The embodiment of the application provides a possible implementation mode, wherein the actual position is represented by the actual coordinates of all pixels on a bounded frame, the predicted position is represented by the predicted coordinates of all pixels on the bounded frame, a loss function value of a preset neural network loss function is calculated according to the actual position and the predicted position, and an initial neural network model is subjected to iterative training according to the loss function value to obtain a traffic signal lamp identification model, and the method comprises the following steps:
Inputting the actual coordinates of all pixels and the predicted coordinates of all pixels into a network loss function of a preset nerve to obtain a loss function value;
If the loss function value does not meet the training ending condition of the neural network model, continuing training the adjusted model based on the training samples and the actual positions of the bounding boxes marked in the sample images in the training samples until the training ending condition of the neural network model is met, and obtaining the traffic signal lamp identification model.
It can be understood that the position of the bounding box can be represented by coordinates of all pixels on the bounding box, that is, the actual position of the bounding box is represented by actual coordinates of all pixels on the bounding box, the predicted position of the bounding box is represented by predicted coordinates of all pixels on the bounding box, and the actual coordinates and the predicted coordinates of all pixels on the bounding box are input into a preset neural network loss function to obtain a loss function value, wherein the preset loss function is:
Where J (θ) represents a loss function value, i represents an i-th pixel on the bounding box, m represents a total of m pixels for the bounding box, (x θ,yθ) is a certain pixel at the predicted position of the bounding box, and (x 0,y0) is a certain pixel at the actual position of the bounding box.
If the loss function value does not meet the training ending condition of the neural network model, the neural network training model is not trained until convergence, namely, the loss function value is not the minimum value, iteration training is needed to be continued on the adjusted model according to each training sample and the actual position of the bounding box marked in each sample image in each training sample until the training ending condition of the neural network model is met, and the traffic signal lamp recognition model is obtained.
The embodiment of the application provides a traffic signal lamp identification method, which comprises the following steps as shown in fig. 3:
Step S301, acquiring an image to be detected at a preset position, wherein the image to be detected comprises imaging content of a traffic signal lamp;
step S302, inputting an image to be detected into a traffic signal lamp identification model to obtain the position of a bounding box, which is output by the traffic signal lamp identification model and is used for marking imaging content, in the image to be detected;
step S303, identifying the color of the area in the bounded frame through a preset color model, taking the color of the area in the bounded frame as the color of a traffic signal lamp, and determining the state of the traffic signal lamp according to the color of the traffic signal lamp;
The traffic light recognition model is trained by the method of the above embodiments.
The preset position of the embodiment of the application can be exemplified by the preset exemplified position of the traffic signal lamp, the image to be detected containing the imaging content of the traffic signal lamp is collected at the position, the image to be detected is directly input into the trained traffic signal lamp identification model, and the position of the bounding box which is output by the traffic signal lamp identification model and used for marking the imaging content in the image to be detected can be obtained.
After the position of the bounding box is obtained, the color of the area in the bounding box is identified through a preset color model, wherein the preset model can be an RGB color model, a HIS color model, an HSV color model and the like.
The RGB (Red-Green-Blue color mode) color model obtains various colors by varying the three color channels of Red, green, and Blue and overlapping them.
The HIS (Hue-Saturation-Intensity) color model describes color characteristics with H, S parameters, where H defines the frequency of the color, becomes Hue, S represents the shade of the color, becomes Saturation, and I represents Intensity or brightness.
An HSV (Hue-Saturation-Value) color model describes color characteristics with H, S, V parameters, where H is defined as Hue, S represents the Saturation of the color, and V represents the lightness.
Of course, the color model preset in the embodiment of the present application may be other color models besides the above models, which is not limited in the embodiment of the present application.
After the color of the area in the bounding box is identified, the color of the area in the bounding box is used as the color of the traffic signal lamp, and the state of the traffic signal lamp is determined according to the color of the traffic signal lamp.
The embodiment of the application provides a possible implementation mode, wherein the state of the traffic signal lamp comprises the passing permission, the warning and the forbidden operation, and the method comprises the following steps of:
If the color of the traffic signal lamp is green, determining that the state of the traffic signal lamp is running traffic;
If the color of the traffic signal lamp is yellow, determining the state of the traffic signal lamp as a warning;
And if the color of the traffic signal lamp is yellow, determining that the state of the traffic signal lamp is forbidden.
The method and the device for identifying the traffic signal lamp in the application embodiment directly identify the color in the bounding box after the bounding box of the imaging content of the traffic signal lamp is identified in the image to be detected through the traffic signal lamp identification model, the color in the bounding box is the color of the traffic signal lamp, the state of the traffic signal lamp is determined according to the color of the traffic signal lamp, specifically, if the color in the bounding box is green, the state of the traffic signal lamp is determined to be allowed to pass, if the color in the bounding box is red, the state of the traffic signal lamp is determined to be forbidden to pass, if the color in the bounding box is yellow, the state of the traffic signal lamp is determined to be warning, and in the automatic driving field or the auxiliary driving field, whether the traffic signal lamp can pass or not can be determined according to the state of the identified traffic signal lamp.
The method for identifying the traffic signal lamp provided by the embodiment of the application is described in a specific embodiment, a target vehicle A starts an unmanned mode on a certain road, a traffic signal lamp is arranged at a crossroad in front, videos containing imaging contents of the traffic signal lamp in front are collected in real time through a camera on the vehicle, video frames of the videos are extracted in real time to obtain a plurality of images to be detected, the images to be detected are input into a traffic signal lamp identification model to obtain the position of a bounding box which is output by the traffic signal lamp identification model and is used for detecting the imaging contents of the traffic signal lamp in the image to be detected, after the position of the bounding box is determined, the color of an area in the bounding box is identified through an RGB color model, the color in the bounding box area is green, the current color of the traffic signal lamp is determined to be green, and the state of the current traffic signal lamp can be determined to be allowed to pass, and the target vehicle A directly passes through the crossroad through the determined state.
The embodiment of the application provides a training device 40 for a traffic light identification model, as shown in fig. 4, the training device 40 for a traffic light identification model may include:
A training data set obtaining module 410, configured to obtain a training data set including a plurality of training samples, where each training sample includes at least two sample images, and the sample images are marked with actual positions of bounding boxes of imaging contents of traffic lights;
The cropping merging module 420 is configured to crop the plurality of sample images, scale and arrange the plurality of cropped sample images, and then merge the plurality of cropped sample images to obtain a merged image;
the first layer sub-network processing module 430 is configured to input the combined image to a first layer sub-network of the initial neural network model, obtain an adaptive anchor frame of the combined image by the first layer sub-network and output an image to be tested after adaptive scaling;
the second layer sub-network processing module 440 is configured to input an image to be measured to a second layer sub-network of the initial neural network model, obtain a plurality of feature images that are output after the second layer neural network starts to extract pixels at intervals of interlacing at a plurality of preset pixels of the image to be measured and convolves a plurality of sub-images obtained after the pixels are extracted;
the third-layer sub-network processing module 450 is configured to input the multiple feature maps to a third-layer sub-network of the initial neural network model, and obtain a fused feature map obtained by feature fusion of the multiple feature maps by the third-layer sub-network;
the fourth-layer sub-network processing module 460 is configured to input the fusion feature map and the image to be detected to a fourth-layer sub-network of the initial neural network model, obtain a target area mapped in the image to be detected by the fusion feature map matched by the fourth-layer sub-network, and adjust an initial position of the bounding box according to the target area, and then output a predicted position of the bounding box;
The iterative training module 470 is configured to calculate a loss function value of a preset neural network loss function according to the actual position and the predicted position, and perform iterative training on the initial neural network model according to the loss function value, so as to obtain a traffic signal lamp recognition model.
The traffic signal lamp identification model trained by the embodiment of the application can identify the bounded frames of the imaging content of the traffic signal lamp in the image, remove the interference factors in the image, accurately identify the traffic signal lamp and improve the accuracy of identifying the traffic signal lamp.
The embodiment of the application provides a possible implementation manner, and the clipping merging module comprises:
A horizontal resolution and vertical resolution obtaining sub-module for obtaining the horizontal resolution and vertical resolution of the sample image for each sample image;
And the cutting sub-module is used for cutting the sample image according to the minimum value so that the horizontal resolution and the vertical resolution of the cut sample image are both accordant with the multiple of the preset numerical value, wherein the minimum value is smaller than the preset numerical value.
The embodiment of the application provides a possible implementation manner, and the clipping merging module further comprises:
The preset resolution obtaining submodule is used for obtaining preset horizontal resolution and preset vertical resolution, and the preset horizontal resolution and the preset vertical resolution are in accordance with the multiple of preset numerical values;
The scaling submodule is used for determining a first scaling proportion of each cut sample image in the horizontal direction and a second scaling proportion in the vertical direction based on the preset vertical resolution, the horizontal resolution and the vertical resolution of the cut sample image;
and the merging sub-module is used for merging the scaled sample images after random arrangement to obtain a merged image.
The embodiment of the application provides a possible implementation manner, and the first layer submodule processing module comprises:
The self-adaptive anchor frame sub-module is used for carrying out self-adaptive anchor frame operation on the combined image, marking the initial positions of all the bounded frames in the combined image, and obtaining a marked image marked with the initial positions of all the bounded frames;
The self-adaptive scaling sub-module is used for carrying out self-adaptive scaling on the marked image to obtain an image to be measured, and the horizontal resolution and the vertical resolution of the image to be measured are both in accordance with the multiple of a preset numerical value.
The embodiment of the application provides a possible implementation manner, and the second layer sub-network processing module comprises:
a preset pixel determination submodule, configured to determine a plurality of preset pixels in an image to be measured;
An interlaced interval extraction sub-module, configured to, for any one preset pixel, start interlaced interval extraction from the preset pixel to obtain a sub-image of the image to be detected;
And the characteristic diagram obtaining sub-module is used for carrying out convolution processing on any sub-diagram for a plurality of times to extract the characteristics of the sub-diagram so as to obtain the characteristic diagram corresponding to the sub-diagram.
The embodiment of the application provides a possible implementation manner, and the fourth layer sub-network processing module comprises:
The feature vector extraction submodule is used for extracting feature vectors of the fusion feature map, wherein the feature vectors comprise pixel value features, texture features, shape features and spatial relationship features;
and the adjustment sub-module is used for determining a target area mapped by the feature vector in the image to be measured, and adjusting the initial position of the bounding box according to the target area to obtain the predicted position of the bounding box.
The embodiment of the application provides a possible implementation mode, wherein the actual position is represented by the actual coordinates of all pixels on the bounding box, the predicted position is represented by the predicted coordinates of all pixels on the bounding box, and the iterative training module comprises:
The loss function value determining submodule is used for inputting actual coordinates of all pixels on the bounded frame and predicted coordinates of all pixels into a preset neural network loss function to obtain a loss function value;
And the iterative training sub-module is used for continuing to iteratively train the initial neural network model based on the actual positions of each training sample and the bounding boxes marked in each sample image in each training sample until the training ending condition of the neural network model is met, so as to obtain the traffic signal lamp identification model.
An embodiment of the present application provides a traffic light identification device 50, as shown in fig. 5, the traffic light identification device 50 may include:
the image to be measured obtaining module 510 is used for obtaining an image to be measured at a preset position, wherein the image to be measured comprises imaging content of a traffic signal lamp;
the position determining module 520 is configured to input an image to be detected into the traffic light identification model, and obtain a position of a bounding box in the image to be detected, the bounding box being output by the traffic light identification model and used for marking imaging content;
The color recognition module 530 is configured to recognize the color of the area in the bounding box through a preset color model, take the color of the area in the bounding box as the color of the traffic signal lamp, and determine the state of the traffic signal lamp according to the color of the traffic signal lamp;
the traffic light recognition model is trained according to the above embodiments.
The embodiment of the application provides a possible implementation mode, and the state of the traffic signal lamp comprises passing permission, warning and forbidden;
the color recognition module comprises a state determination submodule, wherein the state determination submodule is used for determining that the state of the traffic signal lamp is running traffic if the color of the traffic signal lamp is green;
If the color of the traffic signal lamp is yellow, determining the state of the traffic signal lamp as a warning;
And if the color of the traffic signal lamp is yellow, determining that the state of the traffic signal lamp is forbidden.
The device of the embodiment of the present application may perform the method provided by the embodiment of the present application, and its implementation principle is similar, and actions performed by each module in the device of the embodiment of the present application correspond to steps in the method of the embodiment of the present application, and detailed functional descriptions of each module of the device may be referred to the descriptions in the corresponding methods shown in the foregoing, which are not repeated herein.
The embodiment of the application provides electronic equipment, which comprises a memory, a processor and a computer program stored on the memory, wherein the processor executes the computer program to realize the steps of a training method of a traffic signal lamp identification model and a traffic signal lamp identification method.
In an alternative embodiment, an electronic device is provided, as shown in fig. 6, and an electronic device 6000 as shown in fig. 6 includes a processor 6001 and a memory 6003. In which a processor 6001 is coupled to a memory 6003, such as via a bus 6002. Optionally, the electronic device 6000 may also include a transceiver 6004, the transceiver 6004 may be used for data interactions between the electronic device and other electronic devices, such as transmission of data and/or reception of data and the like. It should be noted that, in practical applications, the transceiver 6004 is not limited to one, and the structure of the electronic device 6000 is not limited to the embodiment of the present application.
The Processor 6001 may be a CPU (Central Processing Unit ), general purpose Processor, DSP (DIGITAL SIGNAL Processor, data signal Processor), ASIC (Application SPECIFIC INTEGRATED Circuit), FPGA (Field Programmable GATE ARRAY ) or other programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules and circuits described in connection with this disclosure. The processor 6001 may also be a combination that performs computing functions, e.g., including one or more microprocessors, a combination of a DSP and a microprocessor, and the like.
Bus 6002 may include a path to transfer information between the aforementioned components. Bus 6002 may be a PCI (PERIPHERAL COMPONENT INTERCONNECT, peripheral component interconnect standard) bus or EISA (Extended Industry Standard Architecture ) bus, or the like. The bus 6002 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in fig. 6, but not only one bus or one type of bus.
The Memory 6003 may be a ROM (Read Only Memory) or other type of static storage device that can store static information and instructions, a RAM (Random Access Memory ) or other type of dynamic storage device that can store information and instructions, an EEPROM (ELECTRICALLY ERASABLE PROGRAMMABLE READ ONLY MEMORY ), a CD-ROM (Compact Disc Read Only Memory, compact disc Read Only Memory) or other optical disk storage, optical disk storage (including compact discs, laser discs, optical discs, digital versatile discs, blu-ray discs, etc.), magnetic disk storage media, other magnetic storage devices, or any other medium that can be used to carry or store a computer program and that can be Read by a computer, without limitation.
The memory 6003 is for storing a computer program for executing an embodiment of the present application, and is controlled to be executed by the processor 6001. The processor 6001 is configured to execute a computer program stored in the memory 6003 to implement the steps shown in the foregoing method embodiments.
The electronic device package may include, but is not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), car terminals (e.g., car navigation terminals), etc., and stationary terminals such as digital TVs, desktop computers, etc. The electronic device shown in fig. 6 is merely an example, and should not impose any limitations on the functionality and scope of use of embodiments of the present disclosure.
Embodiments of the present application provide a computer readable storage medium having a computer program stored thereon, which when executed by a processor, implements the steps of the foregoing method embodiments and corresponding content. Compared with the prior art, the traffic signal lamp identification model trained by the application embodiment can identify the bounded frames of the imaging content of the traffic signal lamp in the image, remove the interference factors in the image, accurately identify the traffic signal lamp and improve the accuracy of identifying the traffic signal lamp.
The embodiment of the application also provides a computer program product, which comprises a computer program, wherein the computer program can realize the steps and corresponding contents of the embodiment of the method when being executed by a processor. Compared with the prior art, the traffic signal lamp identification model trained by the application embodiment can identify the bounded frames of the imaging content of the traffic signal lamp in the image, remove the interference factors in the image, accurately identify the traffic signal lamp and improve the accuracy of identifying the traffic signal lamp.
The terms "first," "second," "third," "fourth," "1," "2," and the like in the description and in the claims and in the above figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate, such that the embodiments of the application described herein may be implemented in other sequences than those illustrated or otherwise described.
It should be understood that, although various operation steps are indicated by arrows in the flowcharts of the embodiments of the present application, the order in which these steps are implemented is not limited to the order indicated by the arrows. In some implementations of embodiments of the application, the implementation steps in the flowcharts may be performed in other orders as desired, unless explicitly stated herein. Furthermore, some or all of the steps in the flowcharts may include multiple sub-steps or multiple stages based on the actual implementation scenario. Some or all of these sub-steps or phases may be performed at the same time, or each of these sub-steps or phases may be performed at different times, respectively. In the case of different execution time, the execution sequence of the sub-steps or stages can be flexibly configured according to the requirement, which is not limited by the embodiment of the present application.
The foregoing is only an optional implementation manner of some implementation scenarios of the present application, and it should be noted that, for those skilled in the art, other similar implementation manners based on the technical ideas of the present application are adopted without departing from the technical ideas of the scheme of the present application, which also belongs to the protection scope of the embodiments of the present application.