CN113537120A

CN113537120A - Convolutional neural network based on complex coordinate attention module and target identification method

Info

Publication number: CN113537120A
Application number: CN202110858271.0A
Authority: CN
Inventors: 张袁鹏; 解岩; 张雷; 陈一畅; 姚汉英; 李槟槟; 范亚; 朱振波; 余方利; 汤子跃
Original assignee: Air Force Early Warning Academy
Current assignee: Air Force Early Warning Academy
Priority date: 2021-07-28
Filing date: 2021-07-28
Publication date: 2021-10-22
Anticipated expiration: 2041-07-28
Also published as: CN113537120B

Abstract

The invention discloses a convolutional neural network based on a complex coordinate attention module and a target identification method, and relates to the field of target identification, wherein the convolutional neural network comprises the following components: the device comprises an input layer, N basic units, a classification unit and an output layer; the processing unit is used for mapping the complex numbers into corresponding real numbers through modular operation and performing classification and identification; the N basic units include first to nth basic units, each of the N basic units including: the system comprises a first complex convolution module, a first complex batch normalization module, a first complex activation module and a first complex pooling module; wherein one of the N basic units further comprises: a plurality of coordinate attention modules; the complex coordinate attention module includes: the invention realizes the high-precision identification of the similar space cone target.

Description

Convolutional neural network based on complex coordinate attention module and target identification method

Technical Field

The invention relates to the field of target identification, in particular to a convolutional neural network based on a complex coordinate attention module and a target identification method.

Background

Ballistic missiles release a bait that is very similar to the warhead during penetration, and therefore it is necessary to identify the warhead and the bait during the missile, thereby reducing the interception cost. The warhead and the bait can be regarded as similar space cone targets with the shapes consistent with the movement forms and only slight differences in movement parameters, so that the similar space cone target identification plays an important role in the fields of space resource utilization, space monitoring (surveyability) and military.

In recent years, studies for spatial object recognition by introducing a Convolutional Neural Network (CNN) have been increasing based on the idea of extracting a fine motion feature in an image domain and then using the extracted fine motion feature for recognition. Li et al, based on CNN, investigated the problem of identification of spatially diverse shaped, different precession frequency targets using a Multi-mode Fusion (Multi-mode Fusion) approach. Generating S-frequency band and X-frequency band one-dimensional distance images and time-frequency spectrograms of targets with different shapes and different precession frequencies by using an ideal point scattering model; then, the multimode data is used as input of CNN, and three targets of cone, small cone and cylinder are identified. Bai et al, Xu et al and Han et al all adopt a method of taking a time-frequency spectrogram of a target as network input to identify the target, and actually, the problem of identifying the target by utilizing the micro-motion characteristics is converted into the problem of identifying an image. Designing a CNN with three layers of depth by Bai et al, and generating a time-frequency spectrogram in three micro-motion forms of spin, precession and nutation by using an ideal point scattering model; then, the time-frequency spectrograms are properly cut and used as the input of the designed CNN, and the three common micromotion forms are identified. Xu et al, designing a CNN with the depth of six layers, and generating echo signals in four micromotion forms of spin, rolling, precession and nutation by using a scattering point model; then obtaining time-frequency spectrograms of a plurality of micro-motion periods through WVD (Wigner-Ville distribution); and finally, the time-frequency spectrogram is used as input of the CNN to finish the identification of the four micromotion forms. Han et al designed a deep learning network consisting of one-dimensional parallel structures (1-D parallel structures) and Long-Short-Term Memory (LSTM) layers. The method simulates five targets with different structural parameters and different micromotion forms by using an electromagnetic calculation method to obtain echo data; then, performing Time-frequency analysis on the echoes by Short-Time Fourier Transform (STFT) to obtain Time-frequency spectrograms of a plurality of micro-motion periods; and finally, the time-frequency spectrograms are sent to a designed network to finish the identification of the five targets. Wang et al, based on electromagnetic calculation data, obtained more than one precession cycle distance-slow time image of three targets with different geometric shapes and the same micromotion form, namely cone, cone-cylinder and cone-cylinder skirt, and send the image into the designed CNN to realize target recognition. From the prior art listed above, the main way of doing so is to retain the processing from "echo data domain" to "image domain" as preprocessing, and then replace the process of extracting the micro-motion features from "image domain" for identification with a deep convolutional neural network, which has the following problems: (1) preprocessing such as time-frequency analysis (time-frequency analysis) or distance slow time imaging is required, so that longer signal processing time is required; (2) a long time of continuous observation of the target is required to obtain a complete periodic image of the target; (3) these methods only target different shapes, different micro-motion forms, and do not achieve the identification of similar spatial cone targets.

Disclosure of Invention

In order to solve the three problems of space cone target identification based on the image domain CNN, the invention integrates the advantages of CV-CNN and an attention mechanism, introduces the coordinate attention of a real number domain into a complex number domain, constructs a convolutional neural network based on a complex number coordinate attention module and a target identification method, and aims to take radar echo complex data as input data to carry out direct operation, fully utilizes amplitude and phase information, and realizes the high-precision identification of similar space cone targets with the same geometric shape and micromotion form and the same micromotion parameters and the small difference.

To achieve the above object, the present invention provides a convolutional neural network based on a complex coordinate attention module, the convolutional neural network comprising:

the device comprises an input layer, N basic units, a classification unit and an output layer;

the processing unit is used for mapping the complex numbers into corresponding real numbers through modular operation and performing classification and identification; the N basic units comprise first to Nth basic units, the first basic unit is connected with the input layer, the output of the first basic unit is the input of the second basic unit, the input of the Nth basic unit is the output of the (N-1) th basic unit, N is an integer larger than 1, the output of the Nth basic unit is the input of the processing unit, the output of the processing unit is the input of the classifier, and the classifier is connected with the output layer; the N basic units each include: the system comprises a first complex convolution module, a first complex batch normalization module, a first complex activation module and a first complex pooling module; wherein one of the N basic units further comprises: a plurality of coordinate attention modules; the complex coordinate attention module includes: the system comprises a complex coordinate attention embedding unit and a complex coordinate attention generating unit, wherein for each channel, the complex coordinate attention embedding unit is used for encoding a first complex input feature map of the channel along the horizontal direction and the vertical direction respectively, and generating first output feature information of the first complex input feature map after the first complex input feature map is encoded along the horizontal direction and second output feature information of the first complex input feature map after the first complex input feature map is encoded along the vertical direction;

for each channel, the complex coordinate attention generating unit is to: splicing the first output characteristic information and the second output characteristic information to generate a characteristic information splicing result of the channel; performing feature dimensionality reduction on the feature information splicing result of the channel to obtain feature information after dimensionality reduction, and activating the feature information after dimensionality reduction to obtain a first complex output feature map of the channel; splitting the first complex output profile into a first tensor and a second tensor along a spatial dimension; adjusting the dimensions of the first tensor and the second tensor to be the same as the dimensions of the first complex input feature map, and obtaining a second complex output feature map of the channel in the horizontal direction and a third complex output feature map of the channel in the vertical direction; obtaining a third tensor and a fourth tensor, wherein the third tensor is the set of the second complex output characteristic maps of all the channels, and the fourth tensor is the set of the third complex output characteristic maps of all the channels;

expressing each element in the third tensor and the fourth tensor in a polar coordinate form, constraining the amplitude of the polar coordinate by using a constraint function, respectively obtaining a fourth complex output feature map and a fifth complex output feature map in the horizontal and vertical spatial directions, expanding the fourth complex output feature map and the fifth complex output feature map to generate attention weight distribution in the horizontal and vertical spatial directions, and applying the attention weight distribution to a complex input feature map of the complex coordinate attention module to obtain a complex output feature map of the complex coordinate attention module;

wherein, the complex input characteristic diagram and the complex output characteristic diagram are both complex characteristic diagrams.

When the convolutional neural network based on the complex coordinate attention module is used for target identification, preprocessing such as time-frequency analysis or distance slow time imaging is not needed, so that longer signal processing time is not needed, and the efficiency is higher; when the convolutional neural network based on the complex coordinate attention module is used for identifying the target, the target does not need to be continuously observed for a long time to obtain a complete periodic image of the target, and the efficiency is high; the convolutional neural network based on the complex coordinate attention module realizes the identification of the similar space cone target.

Preferably, in the basic unit not including the complex coordinate attention module, the output of the complex convolution module in the basic unit is the input of the complex batch normalization module, the output of the complex batch normalization module is the input of the complex activation module, and the output of the complex activation module is the input of the complex pooling module.

Preferably, in the basic unit including the complex coordinate attention module, the output of the complex convolution module in the basic unit is the input of the complex batch normalization module, the output of the complex batch normalization module is the input of the complex coordinate attention module, the output of the complex coordinate attention module is the input of the complex activation module, and the output of the complex activation module is the input of the complex pooling module.

Preferably, the classification unit includes:

the second complex convolution module, the second complex batch normalization module, the second complex activation module, the third complex convolution module and the classifier; the output of the second complex convolution module is the input of the second complex batch normalization module, the output of the second complex batch normalization module is the input of the second complex activation module, the output of the second complex activation module is the input of the third complex convolution module, and the output of the third complex convolution module is the input of the classifier.

Preferably, the convolutional neural network includes first to sixth basic units.

Preferably, the sixth base unit comprises said complex coordinate attention module.

Preferably, an optimizer is arranged in the convolutional neural network and used for updating the network weight and the bias term.

Preferably, the number of the convolution kernels of the first to sixth basic units is 64, 128, 256 and 256, respectively, the sizes of the convolution kernels are all 1 × 3, the sizes of the sampling windows of the first complex pooling module are all 1 × 2, the sliding step of the convolution is 1, and the padding number is 1.

Preferably, the complex input feature map is a complex input feature map of the spatial target identification signal, and the complex output feature map is a complex output feature map of the spatial target identification signal.

On one hand, a complex coordinate attention module CV-CA utilizes a complex convolution neural network to obtain amplitude and phase characteristics of a signal through complex real part and imaginary part correlation learning; on the other hand, spatial information and channel information in the horizontal direction and the vertical direction are concerned at the same time through the attention of complex coordinates, remote lazy relation of characteristic information is better modeled, and the characteristic characterization capability of the target object is enhanced.

In the channel attention, global pooling is usually used to encode global spatial information, but it compresses the global spatial information into a channel descriptor, so that it is difficult to maintain location information, which is particularly important for capturing spatial structure. Therefore, in the coordinate attention module, the operation of decomposing the global pooling into two one-dimensional feature codes is extended to a complex field, the complex feature maps X of each channel are encoded along two horizontal and vertical directions respectively (direction-correlation is called horizontal and vertical directions for short), and direction-correlated complex feature maps are generated, so that the features in two spatial directions are integrated respectively.

The complex coordinate attention embedding unit outputs accurate spatial position information aggregated under the global receptive field. Based on the encoding result of the complex coordinate attention embedding unit, the complex coordinate attention module designs a second transformation called a complex coordinate attention generating unit. The complex coordinate attention generating unit transformation includes three parts, respectively: (1) direction-related feature information aggregation, (2) direction-related complex feature map splitting, and (3) complex coordinate attention automatic assignment.

Preferably, in the present invention, X is a complex input feature map of the complex coordinate attention module,

wherein x is_cIs the complex input profile for the c-th channel,

is a C x W x H dimension complex tensor,

the method comprises the following steps of (1) obtaining a complex space, C being the number of channels of input feature maps, W being the width of each input feature map, and H being the height of each input feature map; y is the complex output characteristic diagram of the complex coordinate attention module,

wherein, y_cThe complex output characteristic diagram of the C channel is shown, C is an integer which is greater than or equal to 1 and less than or equal to C, and the dimension of X is the same as that of Y;

the output of the p channel of the complex input feature diagram X after being coded along the horizontal direction is

The output of the p channel of the complex input characteristic diagram X after being coded along the vertical direction is

Wherein:

wherein j represents an imaginary unit,

representing the real part of the complex number,

representing the imaginary part of the complex number, h being the horizontal pixel index of the input feature map, x_p(h, j) is the value of the h row and j column of the p channel of the complex input characteristic diagram, i is the pixel index of the vertical direction of the input characteristic diagram, x_p(i, w) is the value of the ith row and the w column of the ith channel of the complex input characteristic diagram.

Preferably, the invention will

And

splicing is carried out to obtain a characteristic information splicing result M,

each tensor in M is represented as

Wherein, wherein [, ]]Representing splicing operation, wherein T is a transposition matrix;

the complex coordinate attention generating unit uses a convolution kernel of 1 multiplied by 1 to carry out feature dimensionality reduction on the feature information splicing result, wherein the feature dimensionality reduction can reduce the number of parameters, and meanwhile, cross-channel information interaction and integration can be realized, the complex coordinate attention generating unit uses a convolution kernel of 1 multiplied by 1 to carry out feature dimensionality reduction on the feature information splicing result, and the complex coordinate attention generating unit sets the feature dimensionality reduction

A 1 x 1 convolutional kernel shared by convolutional layers, wherein,

is shown ask complex convolution kernels, k being 1,2, …, C/r,

in the representation

The (C) th 1 x 1 rewinding and stacking nucleus,

in the representation

Q is 1,2, …, C, r represents a scaling coefficient for controlling the number of channels of the convolution output feature map, s represents the step size of the convolution operation, and the k-th feature map of the convolution output is v_k(i, j) wherein:

f_k(i,j)＝σ(v_k(i,j))

wherein m is_qFor the qth tensor in M,

m_q(i · s, j · s) are the values of the i · s row and j · s column of the q th tensor after feature information stitching, v_k(i, j) represents a complex output profile of the k-th channel that is not activated,

a complex output characteristic diagram representing the k-th channel, and the set of complex characteristic diagrams of each channel is denoted as

f_C/rIs as follows_C/rA complex output profile of each channel, σ (-) representing a complex activation function; the complex activation function is a CReLU function, which is:

wherein z is a complex variable.

Preferably, in the present invention, the set of complex feature maps of each channel is divided into the first components along the spatial dimension

And the second tensor

Wherein,

and

is a complex output characteristic diagram of the k channel in the horizontal direction,

the complex output characteristic diagram of the kth channel in the vertical direction, the complex output characteristic diagram of the C/r channel in the horizontal direction and the complex output characteristic diagram of the C/r channel in the vertical direction are shown.

Preferably, the present invention uses a 1 × 1 rewinding kernel to convert f^hAnd f is^wRestoring to the same dimension as the X to obtain

And

wherein:

wherein,

is a complex output characteristic diagram of the ith channel in the horizontal direction,

is a complex output characteristic diagram of the ith channel in the vertical direction,

is composed of

The (o) th 1 × 1 complex convolution kernel of (a), o 1,2, ·, C/r,

is a complex output characteristic diagram of the channel o in the horizontal direction,

is composed of

The (o) th 1 x 1 rewinding nucleus in (b),

is a complex output characteristic diagram of the channel o in the vertical direction,

second complex output bit representing the l channel in the horizontal directionSign graph, v^hFor the set of the second complex output profiles for all channels,

a third complex output profile, v, representing the ith channel in the vertical direction^wFor the set of the third complex output profiles of all channels,

preferably, v is defined in the present invention^hAnd said v^wEach element in the system is represented in a polar coordinate form, and the amplitude of the polar coordinate is constrained by a Sigmoid function, which specifically comprises the following steps:

wherein,

and

the complex output characteristic graphs of the ith channel in the horizontal direction and the ith channel in the vertical direction are respectively the result of the amplitude constraint of the complex output characteristic graphs by a Sigmoid function,

and

the phase of the complex output profile for the ith channel in the horizontal and vertical directions respectively,

and

the magnitude of the complex output signature for the ith channel in the horizontal and vertical directions respectively,

and

the phase of the complex output characteristic diagram of the ith channel in the horizontal direction and the vertical direction respectively, Sig (·) represents a Sigmoid function, and the result after constraint through Sigmoid is recorded as

And

under a rectangular coordinate system:

preferably, in the present invention, g is^hAnd said g^wExpanding, generating attention weight distribution in horizontal and vertical space directions, applying the attention weight distribution on a complex input feature map of the complex coordinate attention module to obtain a complex output feature map of the complex coordinate attention module, wherein the complex output feature map of the complex coordinate attention module is y_l(i,j)：

Wherein x is_l(i, j) is the ith row and jth column number of the ith channel complex input feature diagram.

The input and the output of the complex coordinate attention module are both complex form feature information, and the complex coordinate attention module can process the complex feature information.

The complex coordinate attention module obtains the amplitude and phase characteristics of the signal through the associated learning of a real part and an imaginary part of a complex number by using a complex convolution neural network.

According to the invention, the complex coordinate attention module focuses on the spatial information and the channel information in the horizontal and vertical directions simultaneously through the complex coordinate attention, so that the remote lazy relationship of the characteristic information is better modeled, and the characteristic representation capability of the target object is enhanced.

The invention also provides a target identification method, which comprises the following steps:

obtaining a target signal;

inputting the target information into the complex coordinate attention module-based convolutional neural network;

and the convolutional neural network outputs a target recognition result.

Preferably, the target signal is radar echo data.

One or more technical schemes provided by the invention at least have the following technical effects or advantages:

the invention integrates the advantages of CV-CNN and an attention mechanism, introduces the coordinate attention of a real number domain into a complex number domain, constructs a convolutional neural network and a target identification method based on a complex coordinate attention module, can take radar echo complex data as input data to carry out direct operation, fully utilizes amplitude and phase information, and realizes the high-precision identification of similar space cone targets with the same geometric shape and micromotion form and slightly different micromotion parameters.

When the convolutional neural network based on the complex coordinate attention module is used for target identification, preprocessing such as time-frequency analysis or distance slow time imaging is not needed, so that longer signal processing time is not needed, and the efficiency is higher; when the convolutional neural network based on the complex coordinate attention module is used for identifying the target, the target does not need to be continuously observed for a long time to obtain a complete periodic image of the target, and the efficiency is high.

The convolutional neural network based on the complex coordinate attention module can realize high-precision identification on similar space cone targets with the same micromotion form and only slightly different micromotion parameters.

According to the end-to-end similar space cone target identification method, the radar echo complex data are input and the identification result is output, so that echo signal preprocessing and phase information loss are avoided, and the time required by identification is remarkably shortened.

Drawings

The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention;

FIG. 1 is a schematic diagram of a convolutional neural network based on a complex coordinate attention module;

FIG. 2 is a schematic diagram of a plurality of coordinate attention modules;

fig. 3 is a flowchart illustrating a complex input feature map processing method.

Detailed Description

In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments of the present invention and features of the embodiments may be combined with each other without conflicting with each other.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described and thus the scope of the present invention is not limited by the specific embodiments disclosed below.

It is understood that the terms "a" and "an" should be interpreted as meaning that a number of one element or element is one in one embodiment, while a number of other elements is one in another embodiment, and the terms "a" and "an" should not be interpreted as limiting the number.

Example one

Referring to fig. 1, fig. 1 is a schematic structural diagram of a convolutional neural network based on a complex coordinate attention module, the convolutional neural network based on the complex coordinate attention module, and the convolutional neural network includes:

The invention integrates the advantages of CV-CNN and attention mechanism, introduces the coordinate attention of real number domain into complex number domain, and constructs complex number attention network. An attention mechanism (attention mechanism) can capture long-distance dependency relationship through global information search, automatically focus important information through weighted distribution, and ignore unimportant redundant information, which is useful for similar space cone target identification in a short observation duration. The attention mechanism undergoes the development process of spatial attention, channel attention and space-channel attention. The spatial dimension and channel dimension information play a role in improving the network identification capability, which is proved again in a newly proposed Coordinate Attention (CA) module, and the model performance is improved by embedding spatial position information into the channel Attention.

The Complex-Valued convolutional neural network (CV-CNN) can directly process echo Complex data, fully utilize amplitude and phase information, avoid echo preprocessing and reduce recognition time.

The CV-CANet is built based on a CV-CA module. The CV-CANet is an end-to-end complex convolutional neural network, and the architecture of the network is shown in FIG. 1. Each layer of basic unit of the network consists of 4 basic modules of complex convolution, complex batch normalization, complex activation and complex pooling, wherein a CV-CA module is embedded in the sixth layer. The number of convolution kernels of the first layer to the sixth layer is 64, 128, 256 and 256, respectively, the size of each convolution kernel (convolution kernel) is 1 × 3, the size of the sampling window of the pooling layer (pooling layer) is 1 × 2, the sliding step (stride) of all convolution layers is 1, and the padding (padding) is 1. The last two layers of the network replace the traditional full connection with full convolution to reduce the model parameters. The output result of the final layer of full convolution is complex number, and the class label of the target is real number. Therefore, the complex numbers are mapped into corresponding real numbers through modular operation and then sent to a Softmax classifier for classification and identification. The loss function is a cross-entropy loss function. Adaptive motion Estimation (Adam) acts as an optimizer for updating network weights and bias terms.

The number of layers of the complex convolutional neural network, the number of the basic units and the concrete CV-CA module embedded in any basic unit are not limited, and the number of layers and the number of the basic units can be flexibly adjusted according to actual requirements.

According to the similar space cone target end-to-end identification method based on CV-CANet, the identification result of the similar space cone target is directly obtained by inputting radar echo data, so that the problems of complex echo signal preprocessing and phase information loss are solved. In order to directly process radar echo complex signals, the invention provides a CV-CA module and constructs CV-CANet based on the module. The invention introduces a coordinate attention mechanism into a complex field, deduces and establishes basic structures of direction-related complex feature information aggregation, direction-related complex feature diagram splitting, complex coordinate attention automatic allocation and the like. And effective identification can be realized for similar space cone targets with the same micromotion form and only slightly different micromotion parameters.

The method is usually carried out under the condition that the observation time length does not exceed a half period, in practice, a radar cannot observe one target for a long time, or noise exists in data, or data is lost, so that the target is expected to be better identified by less data, and the real-time property is ensured.

Example two

Referring to fig. 2, fig. 2 is a schematic diagram illustrating a composition of a plurality of coordinate attention modules, in this embodiment, the plurality of coordinate attention modules include: the system comprises a complex coordinate attention embedding unit and a complex coordinate attention generating unit, wherein for each channel, the complex coordinate attention embedding unit is used for encoding a first complex input feature map of the channel along the horizontal direction and the vertical direction respectively, and generating first output feature information of the first complex input feature map after the first complex input feature map is encoded along the horizontal direction and second output feature information of the first complex input feature map after the first complex input feature map is encoded along the vertical direction;

The prior art CV-CNN, which simply separates the real and imaginary parts of the complex number or uses a real convolution kernel, does not take advantage of the complex convolution kernel. Therefore, the invention follows the law of complex number calculation, and carries out detailed formula derivation and constructs a complex number coordinate attention module (CV-CA module) according to a complex number network basic unit and a Real number coordinate attention (CV-CA, RV-CA) module.

The CV-CA module proposed by the present invention includes a Complex-Coordinate Attention Information Embedding (CVCIE) and a Complex-Coordinate Attention authorization Generation (CVCAG) unit.

In practical application, the input of the CV-CA module may be any complex-form feature information, and the present invention is described by taking a radar echo signal as an example, but the input feature information in the present invention is not limited to the feature information of the radar echo signal, H is 1 for narrowband radar echo data, and the value of H in practical application may be determined according to practical situations, and the present invention is not limited specifically.

The echo signal of the radar measurement can be expressed as:

S_t(n)＝S_th(n)+ν(n)

wherein S is_thAnd (n) is a theoretical radar echo signal, v (n) represents independent and equally distributed Gaussian white noise generated by a radar receiver, and n represents a pulse sequence number.

Is provided with

Is a complex input profile in which

A complex input profile representing the p-th channel,

is a complex output profile in which

And (3) a complex output characteristic diagram of the p channel, wherein the dimension of the complex output characteristic diagram is the same as that of the X.

Global pooling is usually used for encoding global spatial information in channel attention, but it compresses the global spatial information into a channel descriptor, so that it is difficult to maintain the position information, which is particularly important for capturing the spatial structure. Therefore, in the coordinate attention module, the operation of decomposing the global pooling into two one-dimensional feature codes is extended to the complex field, the complex feature maps X of each channel are encoded along two horizontal and vertical directions (direction-correlation is abbreviated as horizontal and vertical directions), and a direction-correlation complex feature map is generated, so as to integrate the features in two spatial directions, respectively, and this operation is described mathematically as:

wherein j represents an imaginary unit,

representing the real part of the complex number,

representing the imaginary part of the complex number, h being the horizontal pixel index of the input feature map, x_p(h, j) is the value of the h row and j column of the p channel of the complex input characteristic diagram, i is the pixel index of the vertical direction of the input characteristic diagram, x_p(i, w) is the value of the ith row and the w column of the ith channel of the complex input characteristic diagram. The complex characteristic maps of each channel of X are transformed to respectively obtain two complex tensors,

wherein

Wherein

The above CVCIE outputs accurate spatial location information aggregated under the global receptive field. Based on the CVCIE encoding results, the CV-CA module designs a second transformation, referred to as CVCAG. The CVCAG transform includes three steps, respectively: (1) direction-related feature information aggregation, (2) direction-related complex feature map splitting, and (3) complex coordinate attention automatic assignment.

(1) Direction-related complex feature information aggregation

And (4) performing complex splicing. And splicing the results obtained by the formula (1) and the formula (2). Is provided with

For the stitched result, each tensor in M is represented as:

wherein [, ] represents a stitching operation.

And (5) reducing the dimension of the feature. The 1 x 1 convolution kernel is used for reducing the dimension of the characteristic channel, reducing the parameter quantity and realizing the information interaction and integration of the cross-channel. Is provided with

A 1 x 1 rewinding kernel shared for the layer, wherein,

denotes the k-th complex convolution kernel, k ═ 1,2, …, C/r,

in the representation

The (C) th 1 x 1 rewinding and stacking nucleus,

in the representation

Q is 1,2, …, C, r is a scaling coefficient for controlling the number of channels of the convolution output feature map (r is 18 in the present invention, where r may take other values in practical applications, and the present embodiment is not specifically limited), s is a step size of convolution operation, and the kth feature map of the convolution output is v_k(i,j) Wherein:

f_k(i,j)＝σ(v_k(i,j)) (5)

wherein m is_qFor the qth tensor in M,

f_C/rFor the complex output characteristic diagram of the C/r channel, sigma (-) represents the complex activation function; the complex activation function is a CReLU function, which is:

wherein z is a complex variable.

(2) Direction-dependent complex feature map splitting

The complex signature is split. Splitting f into two independent tensors along the spatial dimension

And

namely:

wherein,

and (5) feature dimension increasing. Using a 1X 1 rewinding kernel to f^hAnd f^wReverting to the same dimension as the input signature X. Setting the rewinding product kernel of 1 x 1 in the convolution operation in the horizontal direction as

Wherein

Represents the l (l ═ 1,2, …, C) multiple winding core,

to represent

The (o) th (o ═ 1,2, …, C/r)1 × 1 rewinding and stacking nucleus. In the same way, the method for preparing the composite material,

is a 1 multiplied by 1 rewinding product kernel in the convolution operation in the vertical direction, wherein,

denotes the l (1, 2., C) th complex convolution kernel,

to represent

The qth (l ═ 1,2, …, C)1 × 1 rewinding core in (a), then:

using a 1 x 1 rewinding kernel to wrap said f^hAnd saidf^wRestoring to the same dimension as the X to obtain

And

wherein:

wherein,

is composed of

The (o) th 1 × 1 complex convolution kernel in (a), o ═ 1,2, …, C/r,

is composed of

The (o) th 1 x 1 rewinding nucleus in (b),

second complex output profile, v, representing the ith channel in the horizontal direction^hFor the set of the second complex output profiles for all channels,

(3) automatic complex coordinate attention allocation

Direction-dependent complex attention weight coefficients are calculated. Tensor v of complex eigen-map^hAnd v^wEach element (complex value) in the system is written into a polar coordinate form, then the amplitude of the polar coordinate is constrained by adopting a Sigmoid function, and the amplitude is limited within a value range of 0-1, namely:

wherein,

and

and

and

and

the phase of the complex output characteristic diagram of the ith channel in the horizontal direction and the vertical direction respectively, Sig (·) represents a Sigmoid function and is used for converting the final amplitude into a numerical value between 0 and 1, and the result after being constrained by the Sigmoid is recorded as a numerical value

And

of the kind described aboveThe phase is not affected by merely converting the amplitude of the polar coordinates to the 0-1 range, i.e. the phase information is preserved.

Since the original image is in the rectangular coordinate system, the expressions of equation (11) and equation (14) are converted to the rectangular coordinate system:

under a rectangular coordinate system:

complex coordinate attention is automatically assigned. G output for horizontal and vertical spatial directions^hAnd g^wAnd expanding, generating attention weight distribution in each space direction, and acting on the complex input feature diagram to realize automatic distribution of complex coordinate attention. The output of the complex coordinate attention module is obtained as:

According to the above CV-CA construction process, on one hand, the CV-CA obtains amplitude and phase characteristics of a target signal, such as a radar echo signal, through complex real part and imaginary part associated learning by using a complex convolutional neural network; on the other hand, spatial information and channel information in the horizontal direction and the vertical direction are concerned at the same time through the attention of complex coordinates, remote lazy relation of characteristic information is better modeled, and the characteristic characterization capability of the target object is enhanced.

The CV-CA module provided by the invention comprises two parts, wherein the first part is complex coordinate attention embedding, and the second part is complex coordinate attention generation. The physical significance of each part is explained in detail below.

The first part is complex coordinate attention embedding. In the field of computer vision, the position information on the feature map has an important influence on acquiring the structural features of the space. Since the targets to be resolved by the invention are very similar space cone targets, the invention considers that the space structure information is beneficial to the resolution and identification of the targets. Therefore, in order for the proposed complex coordinate attention module to retain the position information and further capture the long distance dependency on the space by using the position information, the present invention decomposes the global pooling in CNN into pooling operation in the horizontal direction and pooling operation in the vertical direction.

The second part is complex coordinate attention generation, which is done in three sub-steps. For this section, the general design principle of the present invention has three points: 1) the modules should be as simple and light-weight as possible. 2) The module should make full use of the spatial location information obtained in the first part. 3) The module should take into account the interrelationship between the channels in order to take advantage of the channel attention.

Direction-related feature information aggregation. The first part has obtained spatial position information in both the horizontal and vertical directions. Under the principle that the designed module is as simple as possible and the parameter quantity is as small as possible, the invention firstly splices the spatial position information in the horizontal direction and the vertical direction, and aims to simultaneously retain the information in the two directions. Then, the invention uses 1 × 1 convolution kernel to convolute the splicing result for dimension reduction. By the design, the characteristic information among the channels is considered, and the parameter quantity is reduced.

The direction-related complex feature information is split. The present invention expects that the horizontal weight and the vertical weight should be applied to the horizontal direction and the vertical direction of the input feature map, respectively, and the number of channels of the weights should be consistent with the number of channels of the input feature map. Therefore, in the first sub-step, the weights of the spatial position information and the channel information are taken into consideration, and the weight in the horizontal direction and the weight in the vertical direction of the channel information are taken into consideration. Then, the two directions are subjected to dimensionality raising by using a convolution kernel of 1 × 1 respectively.

A plurality of attentions are automatically assigned. After the above-mentioned operation steps, the present invention obtains complex weights considering both spatial position information and channel information, and on one hand, the present invention is expected to retain phase information of the complex weights, and on the other hand, the magnitude of the weights is limited to the range of 0-1. Finally, the attention weight is respectively applied to each element of the input feature vector and each channel, so that the complex coordinate attention provided by the invention is realized. The method can weight each channel and pay attention to the important channel; spatial information is also considered; regions that facilitate target recognition are focused.

In addition, the CV-CA module in the embodiment can obtain the optimal feature recognition capability with the increase of fewer parameters under the condition of ensuring the operation efficiency of the model, and can improve the recognition capability of the model and reduce the target misjudgment probability.

EXAMPLE III

A third embodiment of the present invention provides a complex input feature map processing method, please refer to fig. 3, where fig. 3 is a schematic flow chart of the complex input feature map processing method, where the method includes:

obtaining a complex input characteristic diagram to be processed;

coding the first complex input characteristic diagram of the channel along the horizontal direction and the vertical direction respectively, and generating first output characteristic information of the first complex input characteristic diagram coded along the horizontal direction and second output characteristic information coded along the vertical direction in the channel respectively;

for each channel, splicing the first output characteristic information and the second output characteristic information to generate a characteristic information splicing result of the channel; performing feature dimensionality reduction on the feature information splicing result of the channel to obtain feature information after dimensionality reduction, and activating the feature information after dimensionality reduction to obtain a first complex output feature map of the channel; splitting the first complex output profile into a first tensor and a second tensor along a spatial dimension; adjusting the dimensions of the first tensor and the second tensor to be the same as the dimensions of the first complex input feature map, and obtaining a second complex output feature map of the channel in the horizontal direction and a third complex output feature map of the channel in the vertical direction; obtaining a third tensor and a fourth tensor, wherein the third tensor is the set of the second complex output characteristic maps of all the channels, and the fourth tensor is the set of the third complex output characteristic maps of all the channels;

expressing each element in the third tensor and the fourth tensor in a polar coordinate form, constraining the amplitude of the polar coordinate by using a constraint function, respectively obtaining a fourth complex output feature diagram and a fifth complex output feature diagram in the horizontal and vertical spatial directions, expanding the fourth complex output feature diagram and the fifth complex output feature diagram to generate attention weight distribution in the horizontal and vertical spatial directions, and applying the attention weight distribution to the to-be-processed complex input feature diagram to obtain a processed complex output feature diagram;

The method can be used for processing complex output characteristic information, amplitude and phase characteristics of signals are obtained through complex convolution neural network through complex real part and imaginary part correlation learning, spatial information and channel information in the horizontal direction and the vertical direction are concerned by the method through complex coordinate attention, remote lazy relation of the characteristic information is better modeled, and characteristic characterization capability of a target object is enhanced.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A convolutional neural network based on a complex coordinate attention module, said convolutional neural network comprising:

2. The complex coordinate attention module based convolutional neural network of claim 1, wherein in the basic unit not comprising the complex coordinate attention module, the output of the complex convolution module in the basic unit is the input of the complex batch normalization module, the output of the complex batch normalization module is the input of the complex activation module, and the output of the complex activation module is the input of the complex pooling module.

3. The complex coordinate attention module based convolutional neural network of claim 1, wherein in the basic unit comprising the complex coordinate attention module, the output of the complex convolution module in the basic unit is the input of the complex batch normalization module, the output of the complex batch normalization module is the input of the complex coordinate attention module, the output of the complex coordinate attention module is the input of the complex activation module, and the output of the complex activation module is the input of the complex pooling module.

4. The complex coordinate attention module based convolutional neural network of claim 1, wherein said classification unit comprises:

5. The complex coordinate attention module based convolutional neural network of claim 1, comprising first through sixth base units.

6. The complex coordinate attention module based convolutional neural network of claim 5, wherein a sixth base unit comprises the complex coordinate attention module.

7. The complex coordinate attention module based convolutional neural network of claim 1, wherein an optimizer is provided in the convolutional neural network for updating network weights and bias terms.

8. The complex coordinate attention module based convolutional neural network of claim 1, wherein the complex input feature map is a complex input feature map of a spatial target recognition signal, and the complex output feature map is a complex output feature map of the spatial target recognition signal.

9. An object recognition method, characterized in that the method comprises:

obtaining a target signal;

inputting the target information into the complex coordinate attention module based convolutional neural network of any one of claims 1-5;

and the convolutional neural network outputs a target recognition result.

10. The method of claim 9, wherein the target signal is radar echo data.