CN119723473B

CN119723473B - Security check image detection method, device, equipment and product

Info

Publication number: CN119723473B
Application number: CN202510246136.9A
Authority: CN
Inventors: 路文焕; 刘莞玲; 魏建国
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2025-03-04
Filing date: 2025-03-04
Publication date: 2025-06-20
Anticipated expiration: 2045-03-04
Also published as: CN119723473A

Abstract

The invention provides a security inspection image detection method, a security inspection image detection device, security inspection image detection equipment and a security inspection image detection product, which can be applied to the technical field of target detection. The method comprises the steps of obtaining a security inspection image, wherein the security inspection image is obtained by scanning an article through a ray scanning device, processing the security inspection image through a slice convolution network to obtain a slice convolution feature, processing the slice convolution feature through a multi-residual convolution network to obtain a residual convolution feature, processing the residual convolution feature based on an attention pooling network to obtain a pooled attention feature, processing the pooled attention feature and the residual convolution feature based on a feature pyramid network to obtain an attention fusion feature and a convolution attention feature, processing the attention fusion feature and the convolution attention feature based on a path aggregation network to obtain a fusion feature, processing the attention fusion feature and the fusion feature based on a detection head to obtain a detection result of the security inspection image, and the detection result represents whether the security inspection image contains an abnormal article or not.

Description

Security check image detection method, device, equipment and product

Technical Field

The invention relates to the technical field of target detection, in particular to a security inspection image detection method, a security inspection image detection device, security inspection image detection equipment and security inspection image detection products.

Background

With the rapid development of tourism and transportation, X-ray security inspection machines play an increasingly important role in modern cities. The X-ray security inspection machine is a security inspection means widely applied to public places (such as airports, stations and the like), and security inspectors need to carefully distinguish and observe X-ray images so as to ensure that passenger baggage has no abnormal articles and effectively maintain the security order of the public places. However, the inventor finds that the traditional security inspection mode has the problems of low manual operation efficiency, insufficient professional performance, high omission factor and the like, and is difficult to meet the requirements of the modern society on the rapidness, the high efficiency and the accuracy of the security inspection.

Disclosure of Invention

In view of the above problems, the invention provides a security inspection image detection method, a security inspection image detection device, security inspection image detection equipment and security inspection image detection products.

According to the first aspect of the invention, a security inspection image detection method is provided, and the security inspection image detection method comprises the steps of obtaining a security inspection image, wherein the security inspection image is obtained by scanning an article through a ray scanning device, processing the security inspection image through a slice convolution network to obtain slice convolution features, processing the slice convolution features through a multi-residual convolution network to obtain residual convolution features, processing the residual convolution features based on an attention pooling network to obtain pooled attention features, processing the pooled attention features and the residual convolution features based on a feature pyramid network to obtain attention fusion features and convolution attention features, processing the attention fusion features and the convolution attention features based on a path aggregation network to obtain fusion features, processing the attention fusion features and the fusion features based on a detection head to obtain detection results of the security inspection image, and the detection results represent whether the security inspection image contains abnormal articles or not.

A second aspect of the present invention provides a security inspection image detection apparatus, comprising:

The acquisition module is used for acquiring a security inspection image, wherein the security inspection image is obtained by scanning an article by using a ray scanning device;

The slice convolution characteristic obtaining module is used for processing the security inspection image by using a slice convolution network to obtain slice convolution characteristics;

The residual convolution characteristic obtaining module is used for processing the slice convolution characteristic by utilizing the multi-residual convolution network to obtain the residual convolution characteristic;

The pooled attention feature obtaining module is used for processing residual convolution features based on an attention pooled network to obtain pooled attention features;

The convolution attention fusion obtaining module is used for processing pooled attention features and residual convolution features based on the feature pyramid network to obtain attention fusion features and convolution attention features;

the fusion feature obtaining module is used for processing the attention fusion feature and the convolution attention feature based on the path aggregation network to obtain the fusion feature;

the detection result obtaining module is used for obtaining the detection result of the security inspection image based on the attention fusion feature and the fusion feature processed by the detection head, and the detection result represents whether the security inspection image contains abnormal articles or not.

A third aspect of the invention provides an electronic device comprising one or more processors and a memory for storing one or more computer programs, wherein the one or more processors execute the one or more computer programs to implement the steps of the method.

A fourth aspect of the invention also provides a computer program product comprising a computer program or instructions which, when executed by a processor, performs the steps of the method described above.

According to the embodiment of the invention, the slice convolution characteristic can be obtained by processing the security inspection image through the slice convolution network, the residual convolution characteristic can be obtained by processing the slice convolution characteristic through the multi-residual convolution network, the residual convolution characteristic can be processed based on the attention pooling network, the characteristics under different scales corresponding to the residual convolution characteristic can be pooled, semantic information is enriched, the pooled characteristics are processed based on the attention mechanism, the pooled attention characteristic is obtained, the receptive field is enhanced, the detection of a small-size target is promoted, the pooled attention characteristic and the residual convolution characteristic are processed based on the characteristic pyramid network, the attention fusion characteristic and the convolution attention characteristic can be obtained, the fusion characteristic can be obtained based on the path aggregation network, the attention fusion characteristic and the fusion characteristic can be obtained, and the detection result of the security inspection image can be obtained based on the attention fusion characteristic and the fusion characteristic processed by the detection head, thereby the detection precision is improved, the detection performance of the small-size target and the shielding target is improved, and the generalization capability of the model is improved.

Drawings

The foregoing and other objects, features and advantages of the invention will be apparent from the following description of embodiments of the invention with reference to the accompanying drawings, in which:

fig. 1 shows an application scenario diagram of a security inspection image detection method and apparatus according to an embodiment of the present invention;

FIG. 2 shows a flow chart of a security inspection image detection method according to an embodiment of the invention;

FIG. 3 shows a block diagram of a serial pooling layer according to an embodiment of the invention;

FIG. 4 illustrates a block diagram of a self-attention layer in accordance with an embodiment of the present invention;

FIG. 5 shows a network structure diagram of a detection model according to an embodiment of the present invention;

FIG. 6 illustrates a flow chart of a detection model training and conversion method according to an embodiment of the present invention;

FIG. 7 shows a RKNN image inference flow chart of the target chip-based security inspection image real-time detection system of the present invention;

FIG. 8 is a flow chart showing the transfer of the converted RKNN model to the target chip for reasoning in the target chip-based security inspection image real-time detection system of the present invention;

FIG. 9 shows a schematic diagram of a security inspection image detection result obtained based on the security inspection image detection method;

Fig. 10 shows a block diagram of a security inspection image detection device according to an embodiment of the present invention;

Fig. 11 shows a block diagram of an electronic device adapted to implement a security image detection method according to an embodiment of the invention.

Detailed Description

Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. It should be understood that the description is only illustrative and is not intended to limit the scope of the invention. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the invention. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the present invention.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.

Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a convention should be interpreted in accordance with the meaning of one of skill in the art having generally understood the convention (e.g., "a system having at least one of A, B and C" would include, but not be limited to, systems having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

With the rapid development of tourism and transportation, X-ray security inspection machines play an increasingly important role in modern cities. The X-ray security inspection machine is a security inspection means widely applied to public places (such as airports, stations and the like), and security inspectors need to carefully distinguish and observe X-ray images so as to ensure that passenger baggage has no abnormal articles and effectively maintain the security order of the public places. However, the inventor finds that the traditional security inspection mode has the problems of low manual operation efficiency, insufficient professional performance, high omission rate and the like, and is difficult to meet the requirements of modern society on security inspection.

X-ray image security inspection abnormal article detection technology based on computer vision and artificial intelligence technology is attracting attention and research. The technology aims at automatically identifying and marking abnormal objects in an X-ray image by using algorithms and models through technical means such as computer vision, image processing and machine learning, and performing secondary checking manually, so that the accuracy and efficiency of security check work are improved, the influence of artificial factors on results can be reduced, and the skill requirements and training cost for security check personnel are reduced. However, the existing target detection network is designed based on natural images, and the existing target detection network is used for detecting under the X-ray image scene formed by the security inspection machine, so that the detection effect is poor, and the problems of missing detection, false detection and the like are easy to occur.

In view of the above, the invention provides a security inspection image detection method, a security inspection image detection device, equipment and a product. The method comprises the steps of obtaining a security inspection image, wherein the security inspection image is obtained by scanning an article through a ray scanning device, processing the security inspection image through a slice convolution network to obtain a slice convolution feature, processing the slice convolution feature through a multi-residual convolution network to obtain a residual convolution feature, processing the residual convolution feature based on an attention pooling network to obtain a pooled attention feature, processing the pooled attention feature and the residual convolution feature based on a feature pyramid network to obtain an attention fusion feature and a convolution attention feature, processing the attention fusion feature and the convolution attention feature based on a path aggregation network to obtain a fusion feature, processing the attention fusion feature and the fusion feature based on a detection head to obtain a detection result of the security inspection image, and the detection result represents whether the security inspection image contains an abnormal article or not.

It should be noted that, the security inspection image detection method and the security inspection image detection device provided by the invention can be used in the technical field of target detection, and also can be used in any field except the technical field of target detection, such as public security field, so that the application fields of the security inspection image detection method and the security inspection image detection device provided by the invention are not limited.

In the technical scheme of the invention, the related user information (including but not limited to user personal information, user image information, user equipment information, such as position information and the like) and data (including but not limited to data for analysis, stored data, displayed data and the like) are information and data authorized by a user or fully authorized by all parties, and the related data are collected, stored, used, processed, transmitted, provided, disclosed, applied and the like, all comply with related laws and regulations and standards, necessary security measures are adopted, no prejudice to the public order is provided, and corresponding operation entries are provided for the user to select authorization or rejection.

In the scene of using personal information to make automatic decision, the method, the device and the system provided by the embodiment of the invention provide corresponding operation inlets for users to choose to agree or reject the automatic decision result, and enter an expert decision flow if the users choose to reject. The expression "automated decision" here refers to an activity of automatically analyzing, assessing the behavioral habits, hobbies or economic, health, credit status of an individual, etc. by means of a computer program, and making a decision. The expression "expert decision" here refers to an activity of making a decision by a person who is specializing in a certain field of work, has specialized experience, knowledge and skills and reaches a certain level of expertise.

Fig. 1 shows an application scenario diagram of a security inspection image detection method and device according to an embodiment of the invention.

As shown in fig. 1, the application scenario according to this embodiment may include a first terminal device 101, a second terminal device 102, a third terminal device 103, a network 104, and a server 105. The network 104 is a medium used to provide a communication link between the first terminal device 101, the second terminal device 102, the third terminal device 103, and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 via the network 104 using the first terminal device 101, the second terminal device 102, the third terminal device 103, to receive or send messages etc. Various communication client applications, such as a shopping class application, a web browser application, a search class application, an instant messaging tool, a mailbox client, social platform software, etc. (by way of example only) may be installed on the first terminal device 101, the second terminal device 102, and the third terminal device 103.

The first terminal device 101, the second terminal device 102, the third terminal device 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.

The server 105 may be a server providing various services, such as a background management server (by way of example only) providing support for websites browsed by the user using the first terminal device 101, the second terminal device 102, and the third terminal device 103. The background management server may analyze and process the received data such as the user request, and feed back the processing result (e.g., the web page, information, or data obtained or generated according to the user request) to the terminal device.

It should be noted that, the security inspection image detection method provided by the embodiment of the present invention may be generally executed by the server 105. Accordingly, the security inspection image detection device provided by the embodiment of the invention can be generally arranged in the server 105. The security inspection image detection method provided by the embodiment of the invention may also be performed by a server or a server cluster which is different from the server 105 and is capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103 and/or the server 105. Accordingly, the security inspection image detection apparatus provided by the embodiment of the present invention may also be provided in a server or a server cluster that is different from the server 105 and is capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103, and/or the server 105.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Fig. 2 shows a flowchart of a security inspection image detection method according to an embodiment of the present invention.

As shown in fig. 2, the security inspection image detection method of the embodiment includes operations S210 to S270, and the security inspection image detection method may be executed by an electronic device.

In operation S210, a security inspection image is acquired.

In operation S220, the security inspection image is processed using the slice convolution network to obtain a slice convolution feature.

In operation S230, the slice convolution feature is processed using a multi-residual convolution network to obtain a residual convolution feature.

In operation S240, the residual convolution feature is processed based on the attention pooling network, resulting in a pooled attention feature.

In operation S250, pooled attention features and residual convolution features are processed based on the feature pyramid network, resulting in attention fusion features and convolution attention features.

In operation S260, the attention fusion feature and the convolution attention feature are processed based on the path aggregation network, resulting in a fusion feature.

In operation S270, based on the detection head processing the attention fusion feature and the fusion feature, a detection result of the security inspection image is obtained, and the detection result characterizes whether the security inspection image contains an abnormal object.

According to the embodiment of the invention, the security inspection image is obtained by scanning the article by using a ray scanning device, and an X-ray security inspection image abnormal article open source data set can be also used. The data set may include a plurality of X-ray security images including a plurality of types of abnormal objects in the security images, and the abnormal objects may include spontoons, pliers, hammers, chargers, scissors, wrenches, sprayers, lighters, etc., but are not limited thereto, and embodiments of the disclosure are not limited thereto.

According to the embodiment of the invention, the slice convolution characteristic can be obtained by processing the security inspection image through the slice convolution network. Specifically, the slice convolution network may be used to slice the security inspection image, and then the convolution operation is performed, so as to obtain a slice convolution feature, where the size of the slice convolution feature may beChannel characteristics。

According to the embodiment of the invention, the residual convolution characteristics can be obtained by processing the slice convolution characteristics by utilizing a multi-residual convolution network. Based on the attention pooling network processing residual convolution characteristics, pooled processing can be carried out on the characteristics under different scales corresponding to the residual convolution characteristics, and the pooled characteristics are processed based on an attention mechanism, so that pooled attention characteristics are obtained.

According to the embodiment of the invention, the feature pyramid network is in a top-down structure, and the high-level feature images are fused with the low-level feature images after upsampling. The pooled attention features and residual convolution features can be processed based on the feature pyramid network to obtain attention fusion features and convolution attention features.

According to the embodiment of the invention, the path aggregation network is in a bottom-up structure, and can transmit position information to a high level. The attention fusion feature and the convolution attention feature can be processed based on the path aggregation network to obtain the fusion feature.

According to the embodiment of the invention, the detection head is responsible for converting the features into specific detection results, the detection results of the security inspection image can be obtained based on the detection head processing the attention fusion features and the fusion features, the detection results represent whether the security inspection image contains abnormal articles or not, and the detection results comprise the types of target articles, the boundary box information and the width height.

According to the embodiment of the invention, the slice convolution characteristic can be obtained by processing the security inspection image through the slice convolution network, the residual convolution characteristic can be obtained by processing the slice convolution characteristic through the multi-residual convolution network, the residual convolution characteristic can be processed based on the attention pooling network, the characteristics under different scales corresponding to the residual convolution characteristic can be pooled and the pooled characteristics can be processed based on the attention mechanism, so that the pooled attention characteristic is obtained, semantic information is enriched, the receptive field is enhanced, the detection of a small-size target is promoted, the pooled attention characteristic and the residual convolution characteristic are processed based on the characteristic pyramid network, the attention fusion characteristic and the convolution attention characteristic can be obtained, the fusion characteristic can be obtained based on the path aggregation network, the attention fusion characteristic and the fusion characteristic can be obtained, and the detection result of the security inspection image can be obtained based on the attention fusion characteristic and the fusion characteristic processed by the detection head, thereby improving the detection precision of the small-size target and the detection performance of the shielding target and the generalization capability of the model.

According to the embodiment of the invention, the pooled attention characteristic is obtained based on the attention pooling network processing residual convolution characteristic, wherein the pooled attention characteristic comprises a multi-scale fusion characteristic which can be obtained by utilizing a serial pooling layer to process the residual convolution characteristic, and a pooled attention characteristic which can be obtained by utilizing a self-attention layer to process the multi-scale fusion characteristic, and the attention pooling network can comprise a serial pooling layer and a self-attention layer.

According to the embodiment of the invention, the serial pooling layer is utilized to process residual convolution characteristics, so that multi-scale fusion characteristics can be obtained, the calculation speed is remarkably improved by simplifying pooling operation, the pooled attention characteristics can be obtained based on the self-attention layer to process the multi-scale fusion characteristics, global context information can be effectively captured, the relation among all the characteristics in an image can be considered at the same time, and the global information capturing capability can enable the model to better understand the overall structure in a security inspection image, so that misjudgment caused by local characteristic loss is avoided.

According to the embodiment of the invention, the serial pooling layer is utilized to process residual convolution characteristics to obtain multi-scale fusion characteristics, and the serial pooling layer comprises a convolution normalization block to process the residual convolution characteristics to obtain first residual normalization characteristics, a multi-scale pooling operation is carried out on the first residual normalization characteristics to obtain multi-scale pooling characteristics, the first residual normalization characteristics and the multi-scale pooling characteristics are fused to obtain multi-scale fusion characteristics, and the serial pooling layer comprises the convolution normalization block.

FIG. 3 illustrates a block diagram of a serial pooling layer according to an embodiment of the invention.

As shown in FIG. 3, the residual convolution feature may be processed by a convolution normalization block 310 to obtain a first residual normalization feature, the first residual normalization feature may be processed based on a1 st pooling block 320 to obtain a first pooled feature, the first pooled feature may be processed based on a2 nd pooling block 330 to obtain a second pooled feature, the N-1 st pooled feature may be processed based on an N-th pooling block 340 to obtain an N-th pooled feature, the first residual normalization feature and the multi-scale pooled feature may be fused, the multi-scale pooled feature may include the first pooled feature to the N-th pooled feature, and finally the multi-scale fused feature may be obtained, wherein N is an integer greater than 1, and the N pooled blocks all use the same pooled kernel, e.g., the pooled kernel isMaximum pooling, embodiments of the present invention do not limit the size of the pooled core.

According to the embodiment of the invention, the convolution normalization block is utilized to process residual convolution characteristics, so that first residual normalization characteristics can be obtained, multi-scale pooling operation is performed on the first residual normalization characteristics, multi-scale pooling characteristics are obtained, pooling cores with the same size are used in the multi-scale pooling operation, the complexity of pooling operation is reduced, in addition, multi-scale information can be captured more efficiently by gradually downsampling the first residual normalization characteristics through the multi-scale pooling operation, the calculated amount is reduced, the first residual normalization characteristics and the multi-scale pooling characteristics are fused, multi-scale fusion characteristics containing multi-scale information can be obtained, the fusion mode retains context information with different scales, meanwhile, computational redundancy caused by traditional pooling operation is reduced, memory occupation and computational complexity are remarkably reduced by reducing the number of pooling cores and optimizing pooling operation, and robustness of a model is enhanced.

According to the embodiment of the invention, the multi-scale fusion feature is processed based on the self-attention layer to obtain the pooled attention feature, the method comprises the steps of processing the multi-scale fusion feature based on the convolution block to obtain the query feature, the key feature and the value feature, multiplying the query feature and the key feature to obtain the pixel association feature, processing the pixel association feature by using a normalization algorithm to obtain the attention weight, obtaining the value attention feature according to the attention weight and the value feature, processing the value attention feature based on the convolution block to obtain the value attention convolution feature, and carrying out residual connection and addition operation on the value attention convolution feature and the multi-scale fusion feature to obtain the pooled attention feature.

Fig. 4 shows a block diagram of a self-attention layer according to an embodiment of the present invention.

As shown in fig. 4, the multi-scale fusion feature 410 is processed based on the first convolution block 420 to obtain a Query feature Query, the multi-scale fusion feature 410 is processed based on the second convolution block 430 to obtain a Key feature Key, and the multi-scale fusion feature 410 is processed based on the third convolution block 440 to obtain a Value feature Value. And (3) performing remolding operation on the Query and the Key, and multiplying the Query and the Key to obtain pixel association features so as to calculate the relation between each pixel point and other pixel points on the Query and the Key. The pixel correlation features may be normalized using a softmax function to derive the attention weight. The attention weights may be weighted to Value to obtain a Value attention feature and the Value attention convolution feature is processed based on the fourth convolution block 450 to obtain the Value attention convolution feature. And finally, carrying out residual connection and addition operation on the value attention convolution characteristic and the multi-scale fusion characteristic, and obtaining the pooled attention characteristic.

According to the embodiment of the invention, the multi-scale fusion feature is processed based on the convolution block to obtain the query feature, the key feature and the value feature, the query feature and the key feature are multiplied to obtain the pixel association feature, the relation between each element in the evaluation feature and all other elements is used for enabling the model to capture global information in the multi-scale fusion feature, the long-distance dependency relationship is better understood, the normalization algorithm is utilized to process the pixel association feature to obtain the attention weight, the value attention feature can be obtained according to the attention weight and the value feature, the focus of attention can be automatically adjusted according to the input of different key information, the value attention convolution feature is processed based on the convolution block to obtain the value attention convolution feature, the residual connection and addition operation is carried out on the value attention convolution feature and the multi-scale fusion feature to obtain the pooled attention feature, the dynamic fusion of the features of different scales can be carried out in the process, the detection performance of the model on the multi-scale object can be remarkably improved, and the detection accuracy and the robustness of the detection scene can be remarkably improved when the complex scene such as processing and similar objects are distinguished.

According to the embodiment of the invention, the multi-residual convolution network comprises I feature extraction layers, I is an integer greater than 1, the multi-residual convolution network is utilized to process slice convolution features to obtain residual convolution features, the method comprises the steps of inputting the slice convolution features to the 1 st feature extraction layer to output the 1 st residual convolution feature, inputting the I-1 st residual convolution feature to the I feature extraction layer to output the I th residual convolution feature, and determining the I residual convolution features based on the 1 st residual convolution feature to the I residual convolution feature under the condition that i=I, wherein I is greater than or equal to I >1, and I and I are integers.

According to an embodiment of the present invention, the multi-residual convolution network may comprise i feature extraction layers, i being an integer greater than 1. Feature extraction and parameter reduction can be performed by using the CBS module and the C3 module of the feature extraction layer. The CBS module may include a convolution block, a batch normalization block, and SiLU activation functions, siLU activation functions having upper-bound-free, lower-bound, smooth, non-monotonic characteristics, which layer may outputChannel data of (2). The C3 module may include three componentsThe module and a plurality of backbone residual blocks, the number of backbone residual blocks is specified by parameters for feature extraction, and finally the backbone residual blocks can be output with the size ofChannel data of (2)。

According to the embodiment of the invention, the slice convolution feature is input to the 1 st feature extraction layer to output the 1 st residual convolution feature, the I-1 st residual convolution feature is input to the I th feature extraction layer to output the I th residual convolution feature, and under the condition of i=I, the I residual convolution feature can be determined based on the 1 st residual convolution feature to the I th residual convolution feature, wherein I is more than or equal to I >1.

According to the embodiment of the invention, the detection results of the security inspection image are obtained based on the detection head processing attention fusion characteristics and fusion characteristics, the detection method comprises the steps that based on the first detection head processing attention fusion characteristics, a first detection result of the security inspection image can be obtained, the first detection result represents whether an abnormal object with a first size exists in the security inspection image, based on the second detection head processing fusion characteristics, a second detection result of the security inspection image can be obtained, the second detection result represents whether an abnormal object with a second size exists in the security inspection image, a size difference value between the first size and the second size is larger than a preset size threshold, and the detection head can comprise the first detection head and the second detection head.

According to the embodiment of the invention, for example, the scale of the attention fusion feature is 20×20, the scale of the fusion feature is 80×80, and the image area covered by each pixel in the feature map corresponding to the attention fusion feature is larger, so that the attention fusion feature is suitable for capturing a large-size target. The image area covered by each pixel in the feature map corresponding to the fusion feature is smaller, and the fusion feature is suitable for capturing a small-size target.

According to the embodiment of the invention, by using different detection heads to detect the articles with different dimensions, abnormal articles with different dimensions can be detected in a targeted manner, and the efficiency and the accuracy of security inspection image detection are remarkably improved.

According to the embodiment of the invention, before the slice convolution network is utilized to process the security inspection image to obtain the slice convolution characteristics, the method further comprises the steps of utilizing an image local adjustment algorithm to adjust the contrast and brightness of the initial security inspection image to obtain an intermediate image, and carrying out image enhancement on the intermediate image based on a mosaic algorithm to obtain the security inspection image.

According to the embodiment of the invention, the contrast and brightness of the initial security inspection image can be adjusted by utilizing the image local adjustment algorithm to obtain the intermediate image, so that the detail and texture information of the image are increased. The specific calculation is shown in the formula (1):

(1);

where x and y represent the pixel coordinates of the image, Representing the enhanced image brightness; Represents the luminance value of the corresponding pixel point of the input image, Representing the maximum value of the brightness of the input image.Represents the logarithmic average of the luminance values of the input image, m x n represents the size of the image,Representing a constant other than 0 and very small, to prevent encountering a brightness of 0 in the image, numerical overflow is avoided when performing logarithmic computation on a solid black pixel.

According to the embodiment of the invention, the intermediate image can be subjected to image enhancement based on a mosaic algorithm to obtain a security inspection image, and the image enhancement can comprise chromaticity, saturation, exposure degree transformation, random rotation translation and the like of a hue saturation brightness system (Hue Saturation Value, HSV). The image size processing method can also be used for sizingSecurity imageEqual aspect ratio scaling toImage processing apparatusEdges not meeting the conditions can be filled with white bars, and the obtained imageInput to a slice convolution network.

According to the embodiment of the invention, the details and texture information of the image are increased by adjusting the brightness, contrast, color and other attributes of the initial security inspection image, so that the follow-up accurate target detection of the security inspection image is conveniently realized.

According to the embodiment of the invention, the original data set picture format is a portable network graphic (Portable Network Graphics, png) format, the picture is divided into four folders, training, simplicity, difficulty and shielding are realized, and the labels of the picture are respectively stored into four JavaScript object representation (JavaScript Object Notation, json) files according to the folders. The json file contains the category, bounding box information, width height, file name, etc. of the abnormal object in each picture.

According to the embodiment of the invention, the data set can be divided into the training set and the testing set according to the proportion of 6:4, the training set has 29457 pictures, the testing set is divided into three subsets according to the difficulty level of abnormal article detection, the number of the pictures is 9482, 3733 and 5005 respectively, and the images are simple, difficult and blocked.

According to an embodiment of the present invention, a small target may refer to less than 32×32 pixels, a medium target may refer to pixels between 32×32 and 96×96, and a large target may refer to pixels greater than 96×96. The details of the small, medium and large targets of the dataset are shown in table 1.

Table 1 data set small, medium and large target case table

Data set	Small target (personal)	Middle target (personal)	Big target (personal)
				Simple and easy	85	2278	7119
Difficulty in	65	1759	7068
				Shielding	130	1392	3486
Training	278	11480	27950

According to an embodiment of the invention, the data set is collected from airports, subway stations and railway stations, covering various situations of detection of abnormal items in reality, in particular items that are deliberately hidden. The data set is characterized in that the hidden subset of the test set is focused on abnormal items deliberately hidden in clutter objects, for example by wrapping wires around the abnormal items to interfere with the detection results, and the shielding of the abnormal items is also manifested in that items in personal luggage are typically randomly placed and overlap each other. These features make detection of abnormal items more difficult.

Fig. 5 shows a network structure diagram of a detection model according to an embodiment of the present invention.

As shown in FIG. 5, the security inspection image is processed by a slice convolution network 510 to obtain a slice convolution feature, the slice convolution feature is input to an I-1 th feature extraction layer 521 to output an I-1 th residual convolution feature, the I-1 th residual convolution feature is input to an I-th feature extraction layer 522 to output an I-th residual convolution feature, and the multi-residual convolution network 520 may include I feature extraction layers.

According to an embodiment of the invention, the serial pooling layer 531 is utilized to process residual convolution features to obtain multi-scale fusion features, and the self-attention layer 532 is utilized to process multi-scale fusion features to obtain pooled attention features, wherein the attention pooling network 530 comprises the serial pooling layer 531 and the self-attention layer 532.

In accordance with an embodiment of the present invention, pooled attention features and residual convolution features are processed based on the feature pyramid network 540 to derive attention fusion features and convolution attention features. The attention fusion feature and the convolution attention feature are processed based on the path aggregation network 550 to obtain a fusion feature. The attention fusion feature is processed based on a first detection head 561 to obtain a first detection result of the security inspection image, the fusion feature is processed based on a second detection head 562 to obtain a second detection result of the security inspection image, and the detection head 560 comprises the first detection head 561 and the second detection head 562.

FIG. 6 illustrates a flow chart of a detection model training and conversion method according to an embodiment of the present invention.

As shown in FIG. 6, the test model training and converting method includes operations S610-S619.

In operation S610, the sample security image and the tag data are read.

In operation S611, the sample security image is processed using the slice convolution network, resulting in a sample slice convolution feature.

In operation S612, the sample slice convolution feature is processed using the multi-residual convolution network to obtain a sample residual convolution feature.

In operation S613, the sample residual convolution feature is processed based on the attention pooling network, resulting in a sample pooled attention feature.

In operation S614, the sample pooled attention feature and the sample residual convolution feature are processed based on the feature pyramid network to obtain a sample attention fusion feature and a sample convolution attention feature.

In operation S615, the sample attention fusion feature and the sample convolution attention feature are processed based on the path aggregation network, resulting in a sample fusion feature.

In operation S616, the sample attention fusion feature and the sample fusion feature are processed based on the detection head, and a detection result of the sample security inspection image is obtained.

In operation S617, the detection result of the sample security image and the tag data are processed according to the loss function to obtain a target loss value.

In operation S618, the detection model is trained according to the target loss value, and a trained detection model is obtained.

In operation S619, the trained detection model is converted into a target format, and embedded into a target chip for hardware acceleration reasoning.

According to the embodiment of the invention, json annotation file information in the data set can be read, and the annotation information is in a two-point coordinate formConverted into a central point coordinate and an aspect formSample security images and tag data may be read.

According to the embodiment of the invention, the sample attention fusion characteristics and the sample fusion characteristics are decoded into corresponding position prediction frames on the image, and then the detection results of the sample security inspection images are obtained by performing score sorting and Non-Maximum Suppression (NMS) screening. And processing the detection result of the sample security inspection image and the label data according to the loss function to obtain a target loss value. And training the detection model according to the target loss value to obtain a trained detection model. The performance of the detection model can be evaluated by using a comprehensive cross-over ratio (Complete Intersection over Union, cloU) and a cross-over ratio (Intersection over Union, ioU), and the specific calculation of the comprehensive cross-over ratio and the cross-over ratio is shown in a formula (2).

(2);

Wherein IoU denotes a prediction box b and a real boxThe ratio of the intersection area to the union area,Is a prediction frame b and a real frameThe center-to-center distance squared, c is the circumscribed frame (the smallest rectangular frame surrounding the predicted and real frames) diagonal length,The weight function is represented by a function of the weight,For measuring the consistency of the aspect ratio,、In order to predict the width and height of the frame b,、Is the width and height of the real frame.

The final loss function is defined as the loss consisting of three parts, category loss, object confidence loss bounding box location loss. And minimizing the loss function using an optimizer to train the network model. The category loss and the confidence loss adopt binary cross entropy loss functions, and a calculation formula (3) is as follows:

(3);

Where n represents the total number of samples, Representing the original value of the model output,Representing the true label of the i-th sample,The sigmoid function is represented as a function,Representing an exponential function.

According to the embodiment of the invention, after the trained detection model is obtained, the target related characteristics of the abnormal object can be obtained, the trained detection model can be converted into a target format, such as RKNN format, and the target format is embedded into a target chip to perform hardware acceleration reasoning.

According to the embodiment of the invention, the trained detection model is converted into a target format and is embedded into a target chip for hardware acceleration reasoning, and a system with the target chip is used for processing security inspection images to obtain detection results of the security inspection images, and the method comprises the steps that the system acquires images containing security inspection abnormal objects from an image acquisition unit and adjusts the images to be of a preset size in equal proportion, for example,Is sent to an input queueInitializing the neural network processing unit (Neural Network Processing Unit, NPU) operating environment from the input queueAcquiring an image, taking charge of image reasoning by the NPU with highest calculation performance, and sending a reasoning result to a reasoning queueReading the reasoning result from the output queue, and selecting the detection frame with highest scoreRemoving the output queue if the remaining candidate frames andA kind of electronic deviceA value greater thanThe system carries out picture frame on the original image, displays the detection result in real time and sends the detection result to an output queueAnd respectively controlling the specific hardware units or cores on the RKNN boards through a plurality of processes, wherein the specific hardware units or cores are distributed to different processes to process a plurality of tasks so as to realize parallel processing and improve the overall calculation efficiency. The CPU is enabled to process complex logic and control flows through the cooperative work of the central processing unit (Central Processing Unit, CPU) and the NPU, and the NPU is focused on executing the deep learning algorithm, so that the hardware performance is maximized.

FIG. 7 shows a RKNN image inference flow chart of the target chip-based security inspection image real-time detection system of the present invention.

As shown in fig. 7, an object S710 may be first created RKNN, a preprocessing parameter of the config interface setup model S720 is invoked, a load_ onnx interface import ONNX model S730 is invoked, and a build interface build RKNN model S740 is invoked. The operation environment S750 can be initialized by calling the init_ runtime interface, a target platform and an NPU core scheduling mode are designated, and the inference interface can be used for reasoning to obtain a reasoning result S760, so that input data needs to be preprocessed according to model requirements. Export RKNN interface export RKNN model S770 may also be invoked. After reasoning is complete, release interface release RKNN object S780 may be invoked to release the resource.

According to the embodiment of the invention, the trained detection model with the target weight can be converted into a RKNN model in a virtual machine environment through a model conversion tool, and an X-ray security inspection image detection system is built on a target chip.

According to the embodiment of the invention, the system adopts the target chip to construct an X-ray security inspection image detection system, and the system acquires the image to be detected through inputPerforming multiprocess real-time detection on each imageThe method comprises preprocessing, target reasoning and post-processing, wherein a target chip transmits a detection result back, and a detection system interface is displayed and stored. The method comprises the steps of preprocessing, target reasoning, post-processing display and storage, accelerating by utilizing the cooperative work of a system multi-process CPU and an NPU, acquiring an X-ray security inspection image by an image acquisition unit, preprocessing the image by an image preprocessing unit, reasoning the preprocessed image by a target reasoning unit based on RKNN model to acquire a reasonable detection frame, calling a NMS non-maximum value inhibition and filtration repeated frame by the post-processing display unit, and finally marking the original image and displaying the reasoning result, thereby realizing the real-time detection of the X-ray security inspection image on an embedded platform.

FIG. 8 is a flow chart showing the transfer of the converted RKNN model to the target chip for reasoning in the target chip-based security image real-time detection system of the present invention.

As shown in FIG. 8, an object S810 can be created RKNN first, a load_ rknn interface is used to import RKNN a model S820 based on RKNN objects, an init_ runtime interface is called to initialize an operation environment S830, a target platform and an NPU core scheduling mode are designated, an reasoning result S840 is obtained by using inference interfaces, input data needs to be preprocessed according to model requirements, and a release interface is called to release RKNN an object S850 after the reasoning is completed so as to release resources.

Fig. 9 shows a schematic diagram of a security inspection image detection result obtained based on the security inspection image detection method in the invention.

As shown in fig. 9, the image post-processing result diagram of the real-time detection system based on the security inspection image of the target chip includes a Hammer for abnormal articles Hammer, a Wrench wrench and a Lighter lighter.

Fig. 10 shows a block diagram of a security inspection image detection device according to an embodiment of the present invention.

As shown in fig. 10, the security inspection image detection apparatus 1000 of this embodiment includes an acquisition module 1010, a slice convolution feature obtaining module 1020, a residual convolution feature obtaining module 1030, a pooled attention feature obtaining module 1040, a convolution attention fusion obtaining module 1050, a fusion feature obtaining module 1060, and a detection result obtaining module 1070.

An acquiring module 1010, configured to acquire a security image, where the security image is obtained by scanning an article with a radiation scanning device;

the slice convolution feature obtaining module 1020 is configured to process the security check image by using a slice convolution network to obtain a slice convolution feature;

A residual convolution feature obtaining module 1030, configured to process the slice convolution feature by using the multi-residual convolution network to obtain a residual convolution feature;

A pooled attention feature obtaining module 1040, configured to process the residual convolution feature based on the attention pooled network to obtain pooled attention features;

the convolution attention fusion obtaining module 1050 is configured to process the pooled attention feature and the residual convolution feature based on the feature pyramid network to obtain an attention fusion feature and a convolution attention feature;

A fused feature obtaining module 1060, configured to process the attention fused feature and the convolution attention feature based on the path aggregation network to obtain a fused feature;

The detection result obtaining module 1070 is configured to obtain a detection result of the security inspection image based on the attention fusion feature and the fusion feature processed by the detection head, where the detection result characterizes whether the security inspection image includes an abnormal object.

The pooled attention feature derivation module 1040 includes a multi-scale fusion feature derivation sub-module and a pooled attention feature derivation sub-module, according to embodiments of the present invention.

The multi-scale fusion feature obtaining submodule is used for processing residual convolution features by utilizing the serial pooling layer to obtain the multi-scale fusion feature.

The pooled attention feature obtaining submodule is used for processing the multi-scale fusion features based on the self-attention layer to obtain pooled attention features, and the attention pooling network comprises a serial pooling layer and the self-attention layer.

The multi-scale fusion feature obtaining submodule comprises a first residual error normalization feature obtaining unit, a multi-scale pooling feature obtaining unit and a multi-scale fusion feature obtaining unit.

The first residual normalization feature obtaining unit is used for processing the residual convolution feature by utilizing the convolution normalization block to obtain the first residual normalization feature.

And the multi-scale pooling feature obtaining unit is used for executing multi-scale pooling operation on the first residual error normalized feature to obtain multi-scale pooling features.

The multi-scale fusion feature obtaining unit is used for fusing the first residual error normalization feature and the multi-scale pooling feature to obtain the multi-scale fusion feature, and the serial pooling layer comprises a convolution normalization block.

According to an embodiment of the invention, the pooled attention feature obtaining submodule comprises a convolution block processing unit, a pixel association feature obtaining unit, an attention weight obtaining unit, a value attention feature obtaining unit, a value attention convolution feature obtaining unit and a pooled attention feature obtaining unit.

And the convolution block processing unit is used for processing the multi-scale fusion characteristic based on the convolution block to obtain a query characteristic, a key characteristic and a value characteristic.

And the pixel association characteristic obtaining unit is used for multiplying the query characteristic and the key characteristic to obtain the pixel association characteristic.

And the attention weight obtaining unit is used for processing the pixel association characteristics by using a normalization algorithm to obtain the attention weight.

And the value attention characteristic obtaining unit is used for obtaining the value attention characteristic according to the attention weight and the value characteristic.

And a value attention convolution feature obtaining unit for processing the value attention convolution feature based on the convolution block to obtain the value attention convolution feature.

And the pooled attention characteristic obtaining unit is used for carrying out residual connection and addition operation on the value attention convolution characteristic and the multi-scale fusion characteristic to obtain the pooled attention characteristic.

According to an embodiment of the invention, the multi-residual convolution network comprises i feature extraction layers, i being an integer greater than 1.

The residual convolution feature obtaining module 1030 includes a 1 st residual convolution feature obtaining sub-module, an i-th residual convolution feature obtaining sub-module, and a determining sub-module according to an embodiment of the present invention.

The 1 st residual convolution feature obtaining submodule is used for inputting the slice convolution feature into the 1 st feature extraction layer and outputting the 1 st residual convolution feature.

The ith residual convolution feature obtaining submodule is used for inputting the ith-1 residual convolution feature into the ith feature extraction layer and outputting the ith residual convolution feature.

The determining submodule is used for determining I residual convolution features based on the 1 st residual convolution feature to the I residual convolution feature under the condition that i=I, I is more than or equal to I >1, and I and I are integers.

According to an embodiment of the present invention, the detection result obtaining module 1070 includes a first detection result obtaining sub-module and a second detection result obtaining sub-module.

The first detection result obtaining sub-module is used for obtaining a first detection result of the security inspection image based on the attention fusion feature processed by the first detection head, and the first detection result represents whether the security inspection image has abnormal articles with a first size or not.

The second detection result obtaining sub-module is used for processing the fusion characteristic based on the second detection head to obtain a second detection result of the security inspection image, the second detection result represents whether the security inspection image has abnormal articles with a second size, the size difference value between the first size and the second size is larger than a preset size threshold, and the detection head comprises a first detection head and a second detection head.

According to an embodiment of the present invention, the security inspection image detection device 1000 further includes an intermediate image obtaining module and a security inspection image obtaining module.

The intermediate image obtaining module is used for adjusting the contrast and brightness of the initial security inspection image by utilizing an image local adjustment algorithm to obtain an intermediate image.

And the security inspection image obtaining module is used for carrying out image enhancement on the intermediate image based on a mosaic algorithm to obtain a security inspection image.

Any of the acquisition module 1010, the slice convolution feature obtaining module 1020, the residual convolution feature obtaining module 1030, the pooled attention feature obtaining module 1040, the convolution attention fusion obtaining module 1050, the fusion feature obtaining module 1060, and the detection result obtaining module 1070 may be combined in one module to be implemented, or any of the modules may be split into a plurality of modules, according to an embodiment of the present invention. Or at least some of the functionality of one or more of the modules may be combined with, and implemented in, at least some of the functionality of other modules. At least one of the acquisition module 1010, the slice convolution feature derivation module 1020, the residual convolution feature derivation module 1030, the pooling attention feature derivation module 1040, the convolution attention fusion derivation module 1050, the fusion feature derivation module 1060, and the detection result derivation module 1070 may be implemented, at least in part, as hardware circuitry, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system-on-chip, a system-on-substrate, a system-on-package, an Application Specific Integrated Circuit (ASIC), or in hardware or firmware in any other reasonable manner of integrating or packaging circuitry, or in any one of, or in any suitable combination of, three of software, hardware, and firmware, in accordance with embodiments of the present invention. Or at least one of the acquisition module 1010, the slice convolution feature derivation module 1020, the residual convolution feature derivation module 1030, the pooled attention feature derivation module 1040, the convolution attention fusion derivation module 1050, the fusion feature derivation module 1060, and the detection result derivation module 1070 may be implemented at least in part as a computer program module that, when executed, performs the corresponding functions.

As shown in fig. 11, the electronic device according to the embodiment of the present invention includes a processor 1101 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 1102 or a program loaded from a storage section 1108 into a Random Access Memory (RAM) 1103. The processor 1101 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or an associated chipset and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. The processor 1101 may also include on-board memory for caching purposes. The processor 1101 may comprise a single processing unit or a plurality of processing units for performing the different actions of the method flow according to an embodiment of the invention.

In the RAM 1103, various programs and data necessary for the operation of the electronic device 1100 are stored. The processor 1101, ROM 1102, and RAM 1103 are connected to each other by a bus 1104. The processor 1101 performs various operations of the method flow according to the embodiment of the present invention by executing programs in the ROM 1102 and/or the RAM 1103. Note that the program may be stored in one or more memories other than the ROM 1102 and the RAM 1103. The processor 1101 may also perform various operations of the method flow according to an embodiment of the present invention by executing programs stored in the one or more memories.

According to an embodiment of the invention, the electronic device 1100 may also include an input/output (I/O) interface 1105, the input/output (I/O) interface 1105 also being connected to the bus 1104. The electronic device 1100 may also include one or more of an input section 1106 including a keyboard, mouse, etc., an output section 1107 including a display such as a Cathode Ray Tube (CRT), liquid Crystal Display (LCD), etc., and speakers, etc., a storage section 1108 including a hard disk, etc., and a communication section 1109 including a network interface card such as a LAN card, modem, etc., connected to an input/output (I/O) interface 1105. The communication section 1109 performs communication processing via a network such as the internet. The drive 1110 is also connected to an input/output (I/O) interface 1105 as required. Removable media 1111, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is installed as needed in drive 1110, so that a computer program read therefrom is installed as needed in storage section 1108.

The present invention also provides a computer-readable storage medium that may be included in the apparatus/device/system described in the above embodiments, or may exist alone without being assembled into the apparatus/device/system. The computer-readable storage medium carries one or more programs which, when executed, implement methods in accordance with embodiments of the present invention.

According to embodiments of the invention, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example, but is not limited to, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the invention, the computer-readable storage medium may include ROM 1102 and/or RAM 1103 described above and/or one or more memories other than ROM 1102 and RAM 1103.

Embodiments of the present invention also include a computer program product comprising a computer program containing program code for performing the method shown in the flowcharts. When the computer program product runs in a computer system, the program code is used for enabling the computer system to realize the security inspection image detection method provided by the embodiment of the invention.

The above-described functions defined in the system/apparatus of the embodiment of the present invention are performed when the computer program is executed by the processor 1101. The systems, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the invention.

In one embodiment, the computer program may be based on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program can also be transmitted, distributed over a network medium in the form of signals, downloaded and installed via the communication portion 1109, and/or installed from the removable media 1111. The computer program may comprise program code that is transmitted using any appropriate network medium, including but not limited to wireless, wireline, etc., or any suitable combination of the preceding.

In such an embodiment, the computer program can be downloaded and installed from a network via the communication portion 1109, and/or installed from the removable media 1111. The above-described functions defined in the system of the embodiment of the present invention are performed when the computer program is executed by the processor 1101. The systems, devices, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the invention.

According to embodiments of the present invention, program code for carrying out computer programs provided by embodiments of the present invention may be written in any combination of one or more programming languages, and in particular, such computer programs may be implemented in high-level procedural and/or object-oriented programming languages, and/or in assembly/machine languages. Programming languages include, but are not limited to, such as Java, c++, python, "C" or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Those skilled in the art will appreciate that the features recited in the various embodiments of the invention can be combined and/or combined in a variety of ways, even if such combinations or combinations are not explicitly recited in the present invention. In particular, the features recited in the various embodiments of the invention can be combined and/or combined in various ways without departing from the spirit and teachings of the invention. All such combinations and/or combinations fall within the scope of the invention.

The embodiments of the present invention are described above. These examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Although the embodiments are described above separately, this does not mean that the measures in the embodiments cannot be used advantageously in combination. Various alternatives and modifications can be made by those skilled in the art without departing from the scope of the invention, and such alternatives and modifications are intended to fall within the scope of the invention.

Claims

1. The security inspection image detection method is characterized by comprising the following steps of:

acquiring a security inspection image, wherein the security inspection image is obtained by scanning an article by using a ray scanning device;

Processing the security inspection image by using a slice convolution network to obtain a slice convolution characteristic;

Processing the slice convolution characteristics by using a multi-residual convolution network to obtain residual convolution characteristics;

Processing the residual convolution characteristic based on an attention pooling network to obtain a pooled attention characteristic;

processing the pooled attention feature and the residual convolution feature based on a feature pyramid network to obtain an attention fusion feature and a convolution attention feature;

Processing the attention fusion feature and the convolution attention feature based on a path aggregation network to obtain a fusion feature;

and processing the attention fusion characteristic and the fusion characteristic based on a detection head to obtain a detection result of the security inspection image, wherein the detection result represents whether the security inspection image contains abnormal articles or not.

2. The method of claim 1, wherein the processing the residual convolution feature based on the attention pooling network to obtain a pooled attention feature comprises:

processing the residual convolution characteristic by using a serial pooling layer to obtain a multi-scale fusion characteristic;

processing the multi-scale fusion features based on a self-attention layer resulting in a pooled attention feature, the attention pooling network comprising the serial pooling layer and the self-attention layer.

3. The method of claim 2, wherein processing the residual convolution feature with a serial pooling layer results in a multi-scale fusion feature, comprising:

Processing the residual convolution characteristic by using a convolution normalization block to obtain a first residual normalization characteristic;

performing multi-scale pooling operation on the first residual normalized feature to obtain a multi-scale pooling feature;

and fusing the first residual normalization feature and the multi-scale pooling feature to obtain the multi-scale fusion feature, wherein the serial pooling layer comprises the convolution normalization block.

4. The method of claim 2, wherein the processing the multi-scale fusion feature based on the self-attention layer to obtain a pooled attention feature comprises:

Processing the multi-scale fusion features based on a convolution block to obtain query features, key features and value features;

Multiplying the query feature and the key feature to obtain a pixel association feature;

processing the pixel association features by using a normalization algorithm to obtain attention weights;

obtaining a value attention characteristic according to the attention weight and the value characteristic;

processing the value attention characteristic based on the convolution block to obtain a value attention convolution characteristic;

And carrying out residual connection and addition operation on the value attention convolution characteristic and the multi-scale fusion characteristic to obtain the pooled attention characteristic.

5. The method of claim 1, wherein the multi-residual convolution network comprises I feature extraction layers, I being an integer greater than 1;

the processing the slice convolution feature with a multi-residual convolution network to obtain a residual convolution feature includes:

inputting the slice convolution feature to a1 st feature extraction layer, and outputting a1 st residual convolution feature;

inputting the ith-1 th residual convolution feature to the ith feature extraction layer, outputting the ith residual convolution feature, and

And under the condition of i=i, determining I residual convolution features based on the 1 st to the I th residual convolution features, wherein I is greater than or equal to I >1.

6. The method according to claim 1, wherein the processing the attention fusion feature and the fusion feature based on the detection head to obtain a detection result of the security inspection image includes:

processing the attention fusion characteristic based on a first detection head to obtain a first detection result of the security inspection image, wherein the first detection result represents whether an abnormal object with a first size exists in the security inspection image;

And processing the fusion characteristic based on a second detection head to obtain a second detection result of the security inspection image, wherein the second detection result represents whether the security inspection image has an abnormal object with a second size, a size difference value between the first size and the second size is larger than a preset size threshold, and the detection head comprises the first detection head and the second detection head.

7. The method of claim 1, wherein prior to processing the security inspection image using a slice convolution network to obtain a slice convolution feature, the method further comprises:

adjusting the contrast and brightness of the initial security inspection image by utilizing an image local adjustment algorithm to obtain an intermediate image;

and carrying out image enhancement on the intermediate image based on a mosaic algorithm to obtain the security inspection image.

8. A security inspection image detection device, comprising:

the slice convolution characteristic obtaining module is used for processing the security inspection image by using a slice convolution network to obtain a slice convolution characteristic;

the residual convolution characteristic obtaining module is used for processing the slice convolution characteristic by utilizing a multi-residual convolution network to obtain a residual convolution characteristic;

the pooled attention feature obtaining module is used for processing the residual convolution feature based on an attention pooled network to obtain pooled attention features;

The convolution attention fusion obtaining module is used for processing the pooled attention features and the residual convolution features based on a feature pyramid network to obtain attention fusion features and convolution attention features;

the fusion feature obtaining module is used for processing the attention fusion feature and the convolution attention feature based on a path aggregation network to obtain a fusion feature;

9. An electronic device, comprising:

One or more processors;

a memory for storing one or more computer programs,

Characterized in that the one or more processors execute the one or more computer programs to implement the steps of the method according to any one of claims 1-7.

10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, realizes the steps of the method according to any one of claims 1-7.