CN114565897B

CN114565897B - Traffic light intersection blind guiding method and device

Info

Publication number: CN114565897B
Application number: CN202210059247.5A
Authority: CN
Inventors: 杨福才; 宫平; 俞益洲; 李一鸣; 乔昕
Original assignee: Beijing Shenrui Bolian Technology Co Ltd; Shenzhen Deepwise Bolian Technology Co Ltd
Current assignee: Beijing Shenrui Bolian Technology Co Ltd; Shenzhen Deepwise Bolian Technology Co Ltd
Priority date: 2022-01-19
Filing date: 2022-01-19
Publication date: 2023-04-28
Anticipated expiration: 2042-01-19
Also published as: CN114565897A

Abstract

The invention provides a traffic light intersection blind guiding method and device. The method comprises the following steps: acquiring a video image in front of the blind guiding device in real time; detecting whether the image comprises a traffic light or not by using an identification model, and if so, outputting the state of the traffic light and the distance between the traffic light and the blind guiding device; if a plurality of groups of traffic lights are detected, a group of target traffic lights are identified based on the distance between the traffic lights and the blind guiding device and the distance between the traffic lights and the center of the image visual field; if only one group of traffic lights is detected, the traffic lights are target traffic lights; and playing reminding information related to the target traffic light in real time through the voice module. The invention not only can automatically detect the traffic light state of the crossing, but also can identify the target traffic light from a plurality of groups of detected traffic lights, can effectively reduce the environmental interference, improve the identification precision of the traffic lights, and provide more stable and reliable information for the wearer, thereby increasing the safety of the wearer crossing the traffic light crossing.

Description

Traffic light intersection blind guiding method and device

Technical Field

The invention belongs to the technical field of artificial intelligence, and particularly relates to a traffic light intersection blind guiding method and device.

Background

China is the most common country for blind people worldwide, about 500 ten thousand for blind people, and accounts for 18% of the blind population worldwide. The population of China blind has long exceeded the population of countries such as Denmark, finland or Norway in absolute terms. While 45 tens of thousands of people are blind in China each year. The Chinese department of health has also pointed out that the number of Chinese blind people is increasing due to the rapid aging of the population, population growth and other factors. In contrast, the existing public facilities in China do not fully consider the use scene of the blind person group, such as the setting of traffic lights, and the like, so that the blind person is extremely inconvenient and even dangerous when passing through the traffic light intersection.

With the development of deep learning, computer vision, chemical batteries, etc., a wearable device that can be worn for a long period of time without increasing safety risks has been developed, and smart glasses are the most common. The equipment is easy to store and wear, and is attractive and practical. The conventional intelligent glasses generally use an infrared ranging module to measure the distance, but in a scene of crossing a traffic light, the infrared ranging module cannot measure the distance between a wearer and the opposite traffic light generally because of wearing height and shielding problems.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a traffic light intersection blind guiding method and device.

In order to achieve the above object, the present invention adopts the following technical scheme.

In a first aspect, the present invention provides a traffic light intersection blind guiding method, including the following steps executed in a blind guiding device:

acquiring a video image in front of the blind guiding device in real time; detecting whether the image comprises a traffic light or not by using an identification model, and if so, outputting the state of the traffic light and the distance between the traffic light and the blind guiding device;

if a plurality of groups of traffic lights are detected, a group of target traffic lights are identified based on the distance between the traffic lights and the blind guiding device and the distance between the traffic lights and the center of the image visual field; if only one group of traffic lights is detected, the traffic lights are target traffic lights;

and playing reminding information related to the target traffic light in real time through the voice module.

Further, the recognition model mainly comprises a plurality of basic convolution modules, a main network is adopted to extract features of an input video image, two different algorithms are respectively adopted based on the extracted image features, and a traffic light detection result and the distance between a traffic light and a blind guiding device are respectively output at two output ends.

Still further, the base convolution module employs a mobiletv 2 structure after replacing all activation functions with a prerlu from a ReLU.

Further, the training method of the identification model comprises the following steps:

acquiring a video image of an actual traffic light intersection;

labeling the image, wherein the labeling comprises position information of red and green lamps in the image, traffic light color information and pixel-level depth map information;

and constructing a training data set, and training the recognition model by using the training data set.

Further, when detecting a plurality of groups of traffic lights, the target traffic light identification method comprises the following steps:

sequencing the detected multiple groups of traffic lights according to the sequence from small to large distance between the traffic lights and the blind guiding device, and selecting N groups of traffic lights arranged at the forefront, wherein N is more than or equal to 1;

and respectively calculating the distances between the N groups of traffic lights and the center of the image visual field, wherein the group of traffic lights with the minimum distance is the target traffic light.

Further, the method for calculating the distance between the traffic light and the center of the image view comprises the following steps:

establishing a rectangular coordinate system which takes the upper left corner of the image as an origin and takes two right-angle sides passing through the upper left corner as a transverse axis and a longitudinal axis respectively;

calculating coordinates of the center of the traffic light according to coordinates of the upper left corner and the lower right corner of the traffic light:

x ₀ ＝(x ₁ +x ₂ )/2，y ₀ ＝(y ₁ +y ₂ )/2

in (x) ₁ ,y ₁ )、(x ₂ ,y ₂ )、(x ₀ ,y ₀ ) Coordinates of the upper left corner, the lower right corner and the center of the traffic light respectively;

and obtaining coordinates of the center of the image field of view according to the length and the width of the image:

X＝a/2，Y＝b/2

wherein a and b are the length and width of the image respectively, and (X, Y) is the coordinates of the center of the image field of view;

the distance between the traffic light and the center of the image vision is as follows:

further, the recognition model also outputs a remaining time.

In a second aspect, the present invention provides a traffic light intersection blind guiding device, including:

the traffic light detection module is used for acquiring video images in front of the blind guiding device in real time; detecting whether the image comprises a traffic light or not by using an identification model, and if so, outputting the state of the traffic light and the distance between the traffic light and the blind guiding device;

the target traffic light identification module is used for identifying a group of target traffic lights based on the distance between the traffic lights and the blind guiding device and the distance between the traffic lights and the center of the visual field of the image if a plurality of groups of traffic lights are detected; if only one group of traffic lights is detected, the traffic lights are target traffic lights;

and the information reminding module is used for playing reminding information related to the target traffic light in real time through the voice module.

x ₀ ＝(x ₁ +x ₂ )/2，y ₀ ＝(y ₁ +y ₂ )/2

X＝a/2，Y＝b/2

compared with the prior art, the invention has the following beneficial effects.

According to the invention, the video image in front of the blind guiding device is obtained in real time, the traffic light state in the image and the distance between the traffic light and the blind guiding device are detected by using the identification model, if a plurality of groups of traffic lights are detected, a group of target traffic lights are identified based on the distance between the traffic lights and the blind guiding device and the distance between the traffic lights and the center of the visual field of the image, and the reminding information related to the target traffic lights is played in real time by using the voice module, so that the automatic blind guiding of the traffic light intersection is realized. The invention not only can automatically detect the traffic light state of the crossing and automatically calculate the distance between the traffic light and the blind guiding device, but also can identify the target traffic light from a plurality of groups of detected traffic lights, can effectively reduce the environmental interference, such as automobile steering lights, billboards, other traffic lights and the like, improve the identification precision of the traffic lights, provide more stable and reliable information for the wearer, and further improve the safety of the wearer passing the traffic light crossing.

Drawings

Fig. 1 is a flowchart of a traffic light road opening blind guiding method according to an embodiment of the present invention.

Fig. 2 is a schematic structural diagram of the recognition model.

FIG. 3 is a schematic diagram of a scale-invariant basis convolution module.

Fig. 4 is a schematic diagram of the structure of the downsampling basis convolution module.

Fig. 5 is a block diagram of a traffic light road opening blind guiding device according to an embodiment of the present invention.

Detailed Description

The present invention will be further described with reference to the drawings and the detailed description below, in order to make the objects, technical solutions and advantages of the present invention more apparent. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Fig. 1 is a flowchart of a traffic light road opening blind guiding method according to an embodiment of the present invention, including the following steps executed in a blind guiding device:

step 101, acquiring a video image in front of a blind guiding device in real time; detecting whether the image comprises a traffic light or not by using an identification model, and if so, outputting the state of the traffic light and the distance between the traffic light and the blind guiding device;

102, if a plurality of groups of traffic lights are detected, a group of target traffic lights are identified based on the distance between the traffic lights and the blind guiding device and the distance between the traffic lights and the center of the image visual field; if only one group of traffic lights is detected, the traffic lights are target traffic lights;

and 103, playing reminding information related to the target traffic light in real time through a voice module.

The embodiment provides a traffic light intersection blind guiding method. The method is implemented by a blind guide worn on the head of a user. The blind guiding device generally mainly comprises a camera, a data processing module and a voice module. The camera is used for shooting video images in front of the blind guiding device in real time and inputting the video images into the data processing module. The data processing module is a processing and control center of the blind guiding device and is mainly used for coordinating the work of each module through outputting control signals and realizing tasks such as traffic light detection and distance calculation of an intersection through running a certain algorithm. The voice module is an intelligent module and is used for realizing interaction with a blind guiding device wearer, such as playing the blind guiding (navigation) reminding information in real time under the control of the data processing module, receiving voice instructions of the wearer and the like.

In this embodiment, step 101 is mainly used for detecting traffic lights. And the blind guiding device wearer shoots video images in a certain front visual field range in real time by a camera in the blind guiding device in the running process and sends the video images to a data processing module in the blind guiding device. The data processing module is used for detecting whether the image comprises a traffic light or not by processing the input video image by using the established identification model, and outputting the state of the traffic light at the current moment according to the color of the traffic light if the image comprises the traffic light: the distance value not only can enable a wearer to know the distance between the traffic light and the traffic light in time, but also can be used for identifying the target traffic light in the later step, and the identification model is generally formed by a deep convolutional neural network and is obtained by training by utilizing the acquired traffic light video images of the real intersection.

In this embodiment, step 102 is mainly used for identifying the target traffic light. When a wearer of the blind guiding device arrives at a traffic light intersection, traffic lights on sidewalks in different directions or traffic lights on motor vehicles enter the visual field of the camera at the same time, that is to say, a plurality of groups of traffic lights may be detected in one frame of image, but only one group of traffic lights is the traffic light on the sidewalk to be walked, which is called a target traffic light. It is therefore necessary to identify the target traffic light from the detected sets of traffic lights, which would otherwise lead to errors. The present embodiment performs target traffic light recognition based on the following characteristics: the distance between different traffic lights and the blind guiding device is different, and in general, the closer the distance is, the more likely the traffic lights are target traffic lights; the different traffic lights are positioned differently in the image, the more likely a traffic light is a target traffic light that is generally closer to the center of the image, i.e., the center of the field of view. Of course, it is also possible to detect only one set of traffic lights (a specific viewing angle range), and this is not required to be recognized, and the detected set of traffic lights is the target traffic light. The following embodiment will provide a specific method for identifying the target traffic light.

In this embodiment, step 103 is mainly used for playing the reminding information. The reminding information is played by a voice module in the blind guiding device, and mainly comprises information related to traffic lights, such as whether the traffic lights exist at the coming intersection; if there is a traffic light, how far from the traffic light is, whether the status of the traffic light is red or green, etc., and sometimes the remaining time information.

According to the traffic light detection method and device, the state of the traffic light at the road opening can be automatically detected, the target traffic light can be identified from the detected multiple groups of traffic lights, environmental interference such as automobile steering lights, billboards and other traffic lights can be effectively reduced, traffic light identification precision is improved, more stable and reliable information is provided for a wearer, and accordingly safety of the wearer passing through the traffic light road opening is improved.

As an alternative embodiment, the recognition model mainly comprises a plurality of basic convolution modules, a backbone network is adopted to extract features of an input video image, two different algorithms are respectively adopted based on the extracted image features, and a traffic light detection result and a distance between a traffic light and a blind guiding device are respectively output at two output ends.

The present embodiment provides a network structure for implementing the identification model. The recognition model consists of a plurality of basic convolution modules, and the network structure schematic diagram of the recognition model is shown in fig. 2. The arrows in fig. 2 represent the basic convolution modules, the rectangles of different sizes represent feature maps or input images of different resolutions, the symbols in the maps

And representing the two input feature maps to perform a splicing operation. The present embodiment employs a male memberAnd extracting features of the input video image by using a backbone network, and predicting different tasks, namely distance prediction and traffic light detection, by using different 'prediction heads' by adopting different algorithms based on the extracted features of the image. Such a network structure is advantageous in light weight of the network structure.

As an alternative embodiment, the base convolution module employs a Mobilene V2 structure after replacing all activation functions with PReLUs from ReLUs.

The embodiment provides a technical scheme of the basic convolution module. In order to reduce the calculation amount, the basic convolution module of the embodiment uses the design thought of the mobilet V2 to include a scale-invariant basic convolution module and a downsampling basic convolution module. The scale-invariant basic convolution module is mainly used for increasing the depth of the model, extracting richer features and also increasing the receptive field; the downsampling basic convolution module is mainly used for reducing the scale of the feature map, so that the calculated amount is reduced, and the effect of increasing the receptive field is achieved. The basic convolution module of this embodiment differs from mobilet V2 in that all activation functions are replaced by ReLU to prilu. Since the function value of ReLU is 0 when the argument is less than 0, part of the feature information is lost; the PReLU is different from the ReLU in that the function value is no longer 0 when the independent variable is smaller than 0 (x is equal to ax when x is smaller than 0), so that the richness of the characteristic information can be maintained and the expression capability of the characteristic can be enhanced on the premise of not increasing the calculated amount as much as possible after the improvement. The structural schematic diagrams of the improved scale-invariant basis convolution module and the downsampled basis convolution module are shown in fig. 3 and 4, respectively. The scale-invariant basic convolution module receives an input feature map, amplifies the channel number of the feature map to 4 times of the original number through the point-by-point convolution module (Pointwise Conv), then maintains the channel number through the depth convolution module (Depthwise Conv), and finally restores the channel number of the feature map to the original number through the linear point-by-point convolution module. And the obtained feature map needs to be added with the input feature map transmitted by the shortcut channel element by element, and finally the final output is obtained through the PReLU activation layer. The downsampling basic convolution module is different from the scale-invariant basic convolution module in that a depth convolution module with the step length of 2 is used on the shortcut channel, and the step length of the depth convolution module on the trunk is also 2, so that the downsampling purpose is realized.

As an optional embodiment, the training method of the identification model includes:

step 1011, obtaining a video image of an actual traffic light intersection;

step 1012, labeling the image, including position information of red and green lamps in the image, traffic light color information and pixel level depth map information;

in step 1013, a training data set is constructed, and the recognition model is trained using the training data set.

The present embodiment provides a training method for the recognition model, which is implemented in steps 1011-1013.

Step 1011 is mainly used for acquiring video images of traffic light intersections. The camera is used for collecting traffic light images of the road junction from the actual scene, the collected images can preferably ensure that the quantity of the traffic light state images is balanced, and the problem of unbalanced data is avoided.

Step 1012 is primarily used to annotate the acquired image. The labeling content comprises position information of red and green lamps in the image, traffic light color information, pixel-level depth map information and the like, and can also comprise residual time information. The data can be initially marked by adopting a public algorithm, and the result is manually checked, so that the marking speed and accuracy are ensured. If the data size is too large, the data can be initially marked by using a semi-supervised learning mode, namely, a small part of data is marked firstly, the model is trained by using the part of data, then unlabeled data is predicted, a sample with higher prediction confidence is selected and added into the marked sample, the process is repeated until all the samples are marked or the unlabeled data cannot be obtained with high enough confidence, and then manual verification is carried out. This way, labor and time consumption can be greatly reduced.

Step 1013 is mainly used to construct a training data set and train the recognition model using the training data set. And (5) finishing the data marked in the front to obtain a training data set. In practice, in addition to building training sets, verification sets and test sets are also built. The data obtained in the foregoing may be divided into data sets according to a proportion of 80% -10% -10%, wherein 80% is a training set, 10% is a verification set, and 10% is a test set. When the data set is divided, random sampling without replacement is required to be carried out, so that the consistency of the distribution of the training set, the verification set and the test set can be ensured.

As an optional embodiment, when multiple sets of traffic lights are detected, the method for identifying the target traffic lights includes:

step 1021, sorting the detected multiple groups of traffic lights according to the sequence from small to large distance between the traffic lights and the blind guiding device, and selecting N groups of traffic lights arranged at the forefront, wherein N is more than or equal to 1;

step 1022, calculating the distances between the N groups of traffic lights and the center of the image view, wherein the group of traffic lights with the minimum distance is the target traffic light.

The embodiment provides a technical scheme for identifying the target traffic lights. As mentioned above, in general, the closer the traffic light is to the blind guiding device, the more likely it is to be the target traffic light, and the closer the traffic light is to the center of view of the image, the more likely it is to be the target traffic light. The target traffic light identification is performed according to the two characteristics in this embodiment, and is specifically implemented by step 1021 and step 1022.

Step 1021 is mainly used for performing preliminary screening according to the distance information. Firstly, sequencing the detected multiple groups of traffic lights according to the sequence from small to large distance between the traffic lights and the blind guiding device, and selecting N groups of traffic lights arranged at the forefront. Because the distance between the traffic light on the sidewalk and the blind guiding device is generally smaller than the distance between the traffic light on the motor vehicle lane and the blind guiding device, the traffic light on the motor vehicle lane can be filtered out generally according to the distance information. The value of N is chosen empirically, the magnitude of which is related to the number M of traffic lights detected, generally the larger M, the larger N. Since the value of M is not fixed, N may be dynamically set according to the actual value of M. In this embodiment, the minimum value of N is set to 1, which is considered that the value of M may not be large at some time, and in practical application, the value of N is generally not smaller than 2, because the meaning of step 1022 is not large when n=1.

Step 1022 is mainly used for determining the target traffic light according to the position of the traffic light in the image. The distance between the N groups of traffic lights left after screening and the center of the image view is calculated, and the smaller the distance is, the closer the traffic lights are to the center of the image view, so that the traffic lights with the minimum distance are selected from the traffic lights, and the target traffic lights can be obtained.

It should be noted that the technical solution for identifying the target traffic light according to the above two characteristics is far from just one of the present embodiments, and the above only provides a preferred embodiment for reference of those skilled in the art, and does not exclude or negate other possible implementations.

As an optional embodiment, the method for calculating the distance between the traffic light and the center of the image field of view includes:

x ₀ ＝(x ₁ +x ₂ )/2，y ₀ ＝(y ₁ +y ₂ )/2

X＝a/2，Y＝b/2

the embodiment provides a technical scheme for calculating the distance between the traffic light and the center of the image vision. The calculation method is simple, a coordinate system is established first, and then the solution is carried out according to the analysis geometric knowledge. The specific calculation method is as above, and a detailed description of the calculation process will not be provided here.

As an alternative embodiment, the recognition model also outputs the remaining time.

In this embodiment, the identification model further outputs time information of the traffic light. The existing traffic light intersection generally has a residual time display function, and the residual time data can be identified by the identification model by marking time information in the image during training of the identification model. The residual time is played in real time through the voice module, so that greater convenience can be brought to the blind guiding device wearer.

Fig. 5 is a schematic diagram of a traffic light entrance blind guiding device according to an embodiment of the present invention, where the device includes:

the traffic light detection module 11 is used for acquiring video images in front of the blind guiding device in real time; detecting whether the image comprises a traffic light or not by using an identification model, and if so, outputting the state of the traffic light and the distance between the traffic light and the blind guiding device;

the target traffic light identification module 12 is configured to identify a set of target traffic lights based on a distance between the traffic lights and the blind guiding device and a distance between the traffic lights and a center of the traffic lights deviating from the visual field of the image if a plurality of sets of traffic lights are detected; if only one group of traffic lights is detected, the traffic lights are target traffic lights;

the information reminding module 13 is used for playing reminding information related to the target traffic light in real time through the voice module.

The device of this embodiment may be used to implement the technical solution of the method embodiment shown in fig. 1, and its implementation principle and technical effects are similar, and are not described here again. As well as the latter embodiments, will not be explained again.

x ₀ ＝(x ₁ +x ₂ )/2，y ₀ ＝(y ₁ +y ₂ )/2

X＝a/2，Y＝b/2

the foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the scope of the present invention should be included in the present invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims

1. The traffic light intersection blind guiding method is characterized by comprising the following steps of:

acquiring a video image in front of the blind guiding device in real time; detecting whether the image comprises a traffic light or not by using an identification model, and if so, outputting the state of the traffic light and the distance between the traffic light and the blind guiding device; the recognition model mainly comprises a plurality of basic convolution modules, a main network is adopted to extract characteristics of an input video image, two different algorithms are respectively adopted based on the extracted image characteristics, and a traffic light detection result and the distance between a traffic light and a blind guiding device are respectively output at two output ends; the basic convolution module adopts a MobilenetV2 structure after replacing all activation functions with PReLUs from ReLUs;

if a plurality of groups of traffic lights are detected, a group of target traffic lights are identified based on the distance between the traffic lights and the blind guiding device and the distance between the traffic lights and the center of the image visual field; if only one group of traffic lights is detected, the traffic lights are target traffic lights; when a plurality of groups of traffic lights are detected, the target traffic light identification method comprises the following steps: sequencing the detected multiple groups of traffic lights according to the sequence from small to large distance between the traffic lights and the blind guiding device, and selecting N groups of traffic lights arranged at the forefront, wherein N is more than or equal to 1; respectively calculating the distances between the N groups of traffic lights and the center of the image visual field, wherein the group of traffic lights with the minimum distance is the target traffic light;

2. The traffic light intersection blind guiding method according to claim 1, wherein the training method of the recognition model comprises the following steps:

acquiring a video image of an actual traffic light intersection;

3. The traffic light intersection blind guiding method according to claim 1, wherein the calculating method of the distance between the traffic light and the center of the image view comprises the following steps:

x ₀ ＝(x ₁ +x ₂ )/2，y ₀ ＝(y ₁ +y ₂ )/2

X＝a/2，Y＝b/2

4. the traffic light intersection blind guiding method according to claim 1, wherein the recognition model further outputs a remaining time.

5. A traffic light intersection blind guiding device, the device comprising:

the traffic light detection module is used for acquiring video images in front of the blind guiding device in real time; detecting whether the image comprises a traffic light or not by using an identification model, and if so, outputting the state of the traffic light and the distance between the traffic light and the blind guiding device; the recognition model mainly comprises a plurality of basic convolution modules, a main network is adopted to extract characteristics of an input video image, two different algorithms are respectively adopted based on the extracted image characteristics, and a traffic light detection result and the distance between a traffic light and a blind guiding device are respectively output at two output ends; the basic convolution module adopts a MobilenetV2 structure after replacing all activation functions with PReLUs from ReLUs;

the target traffic light identification module is used for identifying a group of target traffic lights based on the distance between the traffic lights and the blind guiding device and the distance between the traffic lights and the center of the visual field of the image if a plurality of groups of traffic lights are detected; if only one group of traffic lights is detected, the traffic lights are target traffic lights; when a plurality of groups of traffic lights are detected, the target traffic light identification method comprises the following steps: sequencing the detected multiple groups of traffic lights according to the sequence from small to large distance between the traffic lights and the blind guiding device, and selecting N groups of traffic lights arranged at the forefront, wherein N is more than or equal to 1; respectively calculating the distances between the N groups of traffic lights and the center of the image visual field, wherein the group of traffic lights with the minimum distance is the target traffic light;

6. The traffic light intersection blind guiding device according to claim 5, wherein the method for calculating the distance between the traffic light and the center of the image view comprises:

x ₀ ＝(x ₁ +x ₂ )/2，y ₀ ＝(y ₁ +y ₂ )/2

X＝a/2，Y＝b/2