Background
At present, fire detection technologies widely used in construction sites include smoke-sensitive, temperature-sensitive, photosensitive detectors and composite detectors. However, with the rapid development of social economy, the urbanization process is accelerated, and various high, large, new and odd buildings are emerging. The disadvantages of each conventional fire detection technique are gradually manifested. In a large-space building, smoke cannot reach the top of the building due to the existence of a heat barrier layer of the smoke detector; or under the influence of air flow, the smoke is blown away by the air flow, the smoke concentration rising to the top of the building can be greatly reduced, the response threshold value of the smoke detector cannot be reached, an alarm signal cannot be generated, and in addition, if the dust concentration is too high, the false alarm condition of the smoke detector can be caused. The temperature-sensing detector has low sensitivity and long response time, hardly plays a role in the initial smoldering stage, and has a very limited monitoring area. The photosensitive equipment is high in manufacturing cost, and the reliability and the effectiveness of the photosensitive equipment are unstable, so that the use space of the photosensitive equipment in practice is limited. The composite fire detector integrates the smoke detector, the temperature detector and the photosensitive detector, the overall performance of the detector is improved, but the defects of the smoke detector, the temperature detector and the photosensitive detector are not completely eliminated, but the composite fire detector can not be applied to detection and alarm of large-space fire. In view of the great uncertainty, suddenness and variability of the occurrence of fire, conventional fire detectors are not suitable for fire detection in large factories, warehouses and outdoor enlarged spaces such as forest parks. In addition, the traditional fire detector cannot provide more detailed information of a fire scene, such as the fire position, the fire intensity and the like, and cannot well meet the requirements of modern fire detection.
In recent years, a large number of security monitoring systems are arranged in various buildings, and the detection of fire by using monitoring videos becomes a new research direction. With the development of image processing technology, researchers find that flame images in fire have special visual characteristics in the aspects of textures, colors and the like, so that on the basis of preprocessing such as denoising, enhancing, gray level transformation and the like of real-time monitoring video images, static and dynamic visual characteristics of flame and smoke are extracted, and then the technologies such as neural network, mode recognition and the like are applied for classification and recognition. Compared with the traditional fire detection technology, the fire detection technology based on the visual characteristics has the advantages of high response speed, high accuracy, rich and visual information and the like. However, how effective this detection technique is depends largely on the manual selection and extraction of fire image features. The manual feature selection is reasonable and effective, the recognition effect is good, and the manual feature selection depends on professional knowledge and extensive practice to a great extent.
Professional terms such as artificial intelligence and deep learning become a popular word of society, and a new generation of artificial intelligence technology represented by deep learning, such as face recognition, voice recognition, image recognition and the like, is integrated into daily life of people. Deep learning, a branch subject of the field of artificial intelligence research, is a subject that studies how to make a computer acquire learning ability similar to that of a human and can continuously acquire new knowledge. The deep learning is called because the deep learning can independently learn to find the essential characteristics of data from massive data, and an innovative revolution is made in the field of artificial intelligence.
The application provides a flame recognition and detection method based on a Convolutional Neural Network, which is mainly used for recognizing and detecting flames by means of a Convolutional Neural Network (CNN) which is widely applied to pattern recognition and image processing in deep learning. The method avoids the consumption of manpower and material resources caused by the prior image characteristic selection, improves the accuracy of flame identification and detection, and provides a new method for fire detection.
Disclosure of Invention
The method mainly constructs a flame sample library by means of a data enhancement technology; designing a flame identification model flameNet based on an optimization method of replacing a large convolution kernel and a double convolution layer with a small convolution kernel, and comparing convolution neural network models with different convolution kernel numbers and different convolution kernel sizes; and designing a flame detection model by using the fast-RCNN target detection technology for reference. The flame detection device is designed and realized on a Matlab GUI platform, has certain anti-interference capability while ensuring the recognition and detection effects, and provides a new idea for flame detection. The specific content is as follows:
1. a method for optimizing flame identification and monitoring based on a convolutional neural network comprises the following steps:
a) optimizing the performance of the convolutional neural network;
b) constructing a flame image sample library in a mode of increasing data diversity and data enhancement;
c) designing a flame identification model by optimizing the number of convolution kernels, the size of the convolution kernels and the number of model layers;
d) and designing a flame detection model based on a Faster-RCNN algorithm.
2. The flame identification and monitoring method based on the convolutional neural network optimizes the performance of the convolutional neural network and comprises the following steps:
a) the extraction capability of the convolutional neural network on the image can be enhanced by increasing the number of the convolutional kernels, the accuracy of image recognition is further improved, but the time for training convergence is prolonged;
b) the same object can be achieved by a convolution-pooling arrangement, where one or more identical convolution layers are arranged after the convolution layers to form a plurality of sets of convolution structures. The test performance can be improved to a certain extent by adopting a plurality of groups of convolution structures;
c) a plurality of small-size convolution kernels can be used for replacing large-size convolution kernels, so that the original convolution operation is realized, the parameter quantity of the model is greatly reduced, and the parameter quantity of the former and later convolution kernels is replaced.
3. Establishing a flame image sample library, and downloading partial image samples from a large visual image data website; and extracting the video shot by the experiment one by one.
The flame image sample library is constructed, and the sample library construction method comprises the following steps:
a) the data diversity is increased, and the test accuracy can be effectively improved;
b) by means of data enhancement, such as: the image is turned over, cut, the contrast is changed, noise is added, and the like, so that the test accuracy is improved to a certain extent, but the convergence speed is reduced.
4. A flame identification model, comprising:
layer1 is the input layer of the model, and defines the RBG image with the size of 64 multiplied by 64;
the layer 2 is a multiple convolutional layer Conv1, wherein the multiple convolutional layers all comprise two small convolutional layers, the sizes of convolution kernels of the small convolutional layers Conv1-1 and Conv1-2 in the Conv1 are both 3 multiplied by 3, the number of the convolution kernels is 32, the step length is 1, and after being output by the 2 small convolutional layers, the small convolutional layers are activated by an activation function ReLU and then enter the next link;
the 3 rd layer is a pooling layer and adopts a maximum pooling mode with the area of 2 multiplied by 2 and the step length of 2;
the 4 th layer is a multiple convolutional layer Conv2, wherein the multiple convolutional layers all comprise two small convolutional layers, the sizes of convolution kernels of the small convolutional layers Conv2-1 and Conv2-2 in the Conv2 are both 3 multiplied by 3, the number of the convolution kernels is 64, the step length is 1, and after being output by the 2 small convolutional layers, the small convolutional layers are activated by an activation function ReLU and then enter the next link;
the 5 th layer is a pooling layer and adopts a maximum pooling mode with the area of 2 multiplied by 2 and the step length of 2;
the 6 th layer is a full-connection layer of the model and comprises 500 hidden neural nodes;
and the 7 th layer is an output layer of the model, and a Softmax classifier is adopted to judge whether the input picture is a flame or a background image.
5. The flame identification model flameNet is characterized in that the whole network layer is connected layer by layer, and more effective information can be extracted from a flame sample library through a continuous convolution and pooling structure.
The continuous convolution of 2 convolution layers of 3 x 3 is equivalent to the convolution layer of 5 x 5 size in terms of the receptive field, but the parameters of the network model are greatly reduced.
The ReLU activation function is used for enhancing the expression of the nonlinearity degree of the model, and is beneficial to enhancing the abstraction capability of the local model.
The addition of the Dropout layer reduces the calculation amount of the network and effectively controls the overfitting problem.
6. A flame detection model is designed based on a Faster-RCNN target detection algorithm of Nissn paniculate swallowwort root, and is used as a shared convolution Layer, namely Layer1-Layer5, after convolution pooling of a ZFNET model is completed.
The ZFNET model is finely adjusted on the basis of an Alexnet model, the convolution kernel of the convolution layer of the first layer is changed from 11 multiplied by 11 to 7 multiplied by 7, the step length stride is changed from 4 to 2, and the middle flame detection model is named as an FRCNN-ZF model.
The RNP network is designed primarily considering the size of the anchor sliding window, which is modified to match the flame sizes of the images in the sample library, both 64 × 64 and 224 × 224:
anchor point window original size (height x width) modified size (height x width)
Window 1 (128X 128) (64X 32)
Window 2 (128X 256) (32X 64)
Window 3 (256X 128) (64X 64)
Window 4 (256X 256) (128X 64)
Window 5 (256X 512) (64X 128)
Window 6 (512X 256) (128X 128)
Window 7 (512X 512) (256X 128)
Window 8 (512X 1024) (128X 256)
Window 9 (1024X 512) (256X 256)
7. The flame detection model FRCNN-ZF model is characterized by the following by training:
the recall ratio and precision ratio are higher as can be seen from the P-R curve;
the detection rate and the accuracy rate of the flame are better according to the analysis and the representation of the monitoring image;
the monitoring image analysis shows that the flame marking area has certain generalization capability, strong detection capability and certain drift when the color is light and the distance is long;
the monitoring image analysis shows that the monitoring device has certain anti-interference capability.
8. A flame detection system GUI is designed using Matlab GUI functionality. And loading the Fatser-RCNN model and the flame video in sequence, wherein the system extracts the video frame by frame, inputs the video into the Fatser-RCNN model for detection, frames a flame area in an image by the model and gives a score if the flame score is judged by the model to be more than 0.8, and the computer sends an alarm instruction to the alarm to attract the attention of people.
9. The system comprises a flame detection system GUI, wherein a GUI interface is mainly divided into an image area and an operation area, and the image area mainly realizes the presentation of an original image and a detected image; the operation area is a series of operations for loading the detection model, loading the image, the video and detecting.
The detection function and part of the code of the GUI interface comprise:
(1) and detecting the model. And loading the fast-RCNN model. The code for this function is as follows:
function LoadFRCNN_Callback(hObject,eventdata,handles)
global Predictor;
[ filter, pathname ] ═ uigetfile ({ '. mat' }, 'read fast _ RCNN model');
if isequal(filename,0)
msgbox ('no model selected, system settings will be chosen by default');
else
pathfile=fullfile(pathname,filename);
Predictor.LoadFRCNN(pathfile);
msgbox ('model load successful');
end
(2) and loading the image. The code for this function is as follows:
function LoadPicture_Callback(hObject,eventdata,handles)
global Predictor;
[ filename, pathname ] ═ uigetfile ({ '. jpg'; '. png' } 'read picture file');
if isequal(filename,0)
msgbox ('no picture selected');
else
pathfile=fullfile(pathname,filename);
frame=imread(pathfile);
Predictor.Mat=imresize(frame,[240 320]);
axes(handles.axes1);
imshow(Predictor.Mat);
end
(3) and (5) detecting the image. The code for this function is as follows:
function vid_detect_Callback(hObject,eventdata,handles)
outputImage=frame;
[bboxes,scores,~]=detect(frcnn,frame);
[scores,idx]=max(scores);
if~isempty(bboxes)
size_array=size(bboxes);
length=size_array(1);
for i=1:length
box=bboxes(i,:);
frame_=imcrop(frame,box);
annotation=sprintf('%s:%f','Flame',scores);
outputImage=insertObjectAnnotation(outputImage,'rectangle',box,annotation);
end
end
imshow(outputImage);
end
the beneficial technical achievements of the invention are as follows:
(1) the flame identification model flameNet is designed, the sample libraries before and after data enhancement are respectively trained, the identification accuracy of the model is improved to 98.32% from 91.21%, and the identification accuracy of the model can be improved by a data enhancement technology.
(2) By setting, it was found that when the number of Conv1 layer convolution kernels is 32 and the size of the convolution kernel is 3 × 3, the convergence speed of the model is fast, and the flame identification accuracy is 98.54% at the maximum.
(3) On the basis of the Faster-RCNN target detection method, the size of the anchor point sliding window is modified, and the FRCNN-ZF model is verified to be stronger in flame detection capability, generalization capability and anti-interference capability.
(4) The flame detection model is used for detecting real fire, the response time is 6 seconds, the flame can be detected in the early stage, the alarm is realized, and the response time is far shorter than that of a smoke detector.
(5) A simple flame detection system is designed on the Matlab platform, the display effect of flame detection is more intuitively shown, the system is simple and easy to use, and non-professionals can know that the flame detection is carried out by using a convolutional neural network intuitively.
Detailed Description
The invention is described in further detail below with reference to the figures and the detailed description.
1. Flame recognition model
The flame identification model flameNet structure based on the convolutional neural network is shown in figure 1.
The flame identification model flameNet designed in the method has 12 layers in total, the whole network layer is connected layer by layer, and the continuous convolution and pooling structure can extract more effective information from a flame sample library; 2 convolution layers of 3 x 3 are continuously convoluted, and are equivalent to convolution layers of 5 x 5 in size in the aspect of a receptive field, but parameters of a network model are greatly reduced; the ReLU activation function is used for enhancing the expression of the model nonlinearity degree, and the abstraction capability of a local model is enhanced; the addition of the Dropout layer reduces the calculation amount of the network and effectively controls the overfitting problem.
The convolutional neural network automatically learns the input image in an end-to-end mode, so that the data characteristics learned by the convolutional neural network are visualized, and the method is very helpful for deeply knowing the convolutional neural network. The specific identification process is as follows:
(1) an image of a flame is input into the FlameNet model.
The output characteristic diagram of the Conv1-1 convolutional layer obtained after the input flame image is processed by the Flamenet model Conv1-1 convolutional layer is shown in FIG. 2. Each small image in the figure shows the outline of the flame, on the basis of which it can be inferred that the convolution kernel of the Conv1-1 convolution layer of the FlameNet model mainly learns the edge outline information of objects in the input image.
(2) Further analysis can find that the visual angles of the adjacent flame outline images are similar, and the visual angles of the flame outline images which are far away from each other slightly differ. This shows that the larger the number of convolution kernels, the more different visual angles the convolutional neural network model can observe the object and learn more characteristic information of the object, which is more beneficial to the model identification. So a suitable number of convolution kernels is selected for further analysis of the picture.
2. Flame detection model
Flame detection is the framing of the exact location of the flame image from the input image and the labeling of the flame.
The flame detection model based on the convolutional neural network is used as a shared convolutional layer by using the convolutional pooling of the ZFNET model in a Faster-RCNN target detection algorithm. The ZFNET model is fine-tuned on the basis of the Alexnet model, the convolution kernel of the first layer convolution layer is changed from 11 multiplied by 11 to 7 multiplied by 7, the step size stride is changed from 4 to 2, and the structure of the ZFNET model is shown in figure 3. The convolution pooled portion in the ZFNET serves as the shared convolution Layer, i.e., Layer1-Layer 5. As shown in fig. 3. The specific detection process is as follows:
the fast-RCNN structure flow is shown in FIG. 4, and the flow is A, B, C three modules in sequence.
(1) In the module A, the shared convolution layer is used for extracting the features of the input image to obtain a global shared feature map. The Faster-RCNN model proposed by Niancheng Gaultheria utilizes the convolutional layer of ZFNET model as the shared convolutional layer.
(2) In the module B, the candidate frame is mainly solved. Inputting the global shared graph output by the module A into a shared convolution layer of the RPN to obtain an RPN shared characteristic graph; and respectively inputting the RPN shared feature map into the classification and regression convolution layers to obtain M candidate frames (ROI), wherein the candidate frames comprise coordinate information and probability information. And then taking the first N candidate frames with the maximum foreground probability. The candidate frames which are crossed, overlapped and do not contain the target are eliminated by using the candidate frame with larger probability and the candidate frame with smaller probability which is more than a certain proportion of the overlapped area of the candidate frames with larger probability by using non-maximum value suppression (NMS) to obtain K candidate frames which are used as the output of the RPN.
It is noted that k anchor points (anchors) are defined in module B, as shown in fig. 5. The method comprises the steps of taking the center of each current sliding window as the center, mapping the sliding window to a receptive field corresponding to an original image, taking the center of the receptive field as the center, defining k anchors, wherein each anchor corresponds to a frame with one area size and one length-width ratio, and all anchors almost completely cover the position of a true value frame in the image after being corrected.
(3) And in the module C, mapping the coordinates of the K candidate frames output by the module B to the global shared feature map to obtain the global shared feature map of the candidate frames. Since the size of each frame candidate may be different, size normalization (ROI Pooling) is required to obtain a frame candidate feature map with the same size. And the candidate frame feature map passes through a full connection layer to obtain candidate frame feature vectors, and the candidate frame feature vectors pass through a regression full connection layer and a correction full connection layer respectively to obtain classification vectors and correction vectors. And finally, suppressing and reserving the candidate frame with the highest probability through a non-maximum value to obtain a final target detection result.
The flame detection model is used for real fire detection, the response time is 6 seconds, flame can be detected in the early stage, alarm is realized, and the response time is far shorter than that of a smoke detector.
3. Flame detection system GUI
The flame detection system is designed by using the GUI function of Matlab, and the schematic diagram of the flame detection system is shown in FIG. 6. And loading the Fatser-RCNN model and the flame video in sequence, wherein the system extracts the video frame by frame, inputs the video into the Fatser-RCNN model for detection, frames a flame area in an image by the model and gives a score if the flame score is judged by the model to be more than 0.8, and the computer sends an alarm instruction to the alarm to attract the attention of people.
The flame detection system is mainly divided into an image area and an operation area, wherein the image area mainly realizes the presentation of an original image and a detected image; the operation area is a series of operations for loading the detection model, loading the image, the video and detecting. The specific process is as follows:
(1) and clicking the detection model. And loading the fast-RCNN model.
(2) Click to load the image. And selecting the image to be loaded, and realizing the loading of the image. The video can also be loaded and played in a figure window in the GUI interface.
(3) And detecting the click image. The algorithm can carry out detection tasks on the currently loaded images, the program can automatically call a pre-loaded fast-RCNN flame detection model during detection, the detection result can obtain the coordinates of flames in the images and a rectangular selection frame surrounded by the coordinates, the probability value of the flames detected by the system can be given, in a GUI (graphical user interface), the video is firstly decomposed according to the number of frames, then each frame is detected, and the detection result is synchronously displayed.