CN112907627B

CN112907627B - System, method, apparatus, processor and computer readable storage medium for realizing accurate tracking of small sample targets

Info

Publication number: CN112907627B
Application number: CN202110176675.1A
Authority: CN
Inventors: 赵锐; 吴松洋; 李宁; 毛翌
Original assignee: Third Research Institute of the Ministry of Public Security
Current assignee: Third Research Institute of the Ministry of Public Security
Priority date: 2021-02-07
Filing date: 2021-02-07
Publication date: 2024-02-02
Anticipated expiration: 2041-02-07
Also published as: CN112907627A

Abstract

The invention relates to a system for realizing accurate tracking of a small sample target, which comprises a portrait acquisition module, a video acquisition module and a video acquisition module, wherein the portrait acquisition module is used for acquiring portrait view information from a video; the portrait access convergence module is used for accessing and forwarding portrait view information and converging the acquired portrait view information through the acquisition interface; the portrait view library is used for storing portrait pictures, portrait video clips and passersby records; the portrait analyzing module is used for providing portrait characteristic extraction and portrait 1:1 comparison, 1: n search, N: n clustering, N: m collision comparison retrieval service; the portrait application service supporting module is used for sending task instructions to the portrait analyzing module through the analyzing interface. The invention also relates to a method for realizing the accurate tracking of the small sample target. The system, the method, the device, the processor and the computer readable storage medium thereof for realizing the accurate tracking of the small sample target remove external interference caused by complex background and shielding by means of image segmentation and the like, reduce the false alarm rate of the single-dimensional biological characteristics and have wider application range.

Description

System, method, apparatus, processor and computer readable storage medium for realizing accurate tracking of small sample targets

Technical Field

The invention relates to the field of artificial intelligence, in particular to the field of visual target tracking, and specifically relates to a system, a method, a device, a processor and a computer readable storage medium thereof for realizing accurate tracking of a small sample target.

Background

Visual target tracking (Visual Object Tracking) is an important problem in the field of computer vision, and can realize human body tracking, face tracking, vehicle tracking in a traffic monitoring system, gesture tracking in an intelligent interaction system, automatic target tracking of an unmanned aerial vehicle and the like. Although widely studied in recent years, the object tracking problem is slightly lower in research heat than basic visual tasks such as object detection, semantic segmentation and the like due to high difficulty and rareness of high-quality data. The development of deep learning and the enhancement of computer power bring about the rapid advance of visual algorithm performance, and the method based on the deep neural network in the field of target tracking is only first seen in the last few years, which can be said to be great.

The key for realizing target tracking is to completely divide the target, reasonably extract the characteristics and accurately identify the target, and consider the time of algorithm realization so as to ensure real-time performance. Because pedestrians have the characteristics of rigid and flexible objects, the appearance is easily influenced by various complex factors such as wearing, posture and visual angle changes, illumination, shielding, environment and the like, and the human body tracking is faced with huge technical challenges. The visual target tracking is limited by cross-resource video, image shooting environment and pedestrian appearance, has low accuracy and reliability in application, and loses use significance in most practical application scenes.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a system, a method, a device, a processor and a computer readable storage medium thereof for realizing accurate tracking of a small sample target, wherein the system, the method, the device, the processor and the computer readable storage medium meet the requirements of refinement, structuring, efficient control and real-time pursuit.

To achieve the above object, a system, a method, an apparatus, a processor, and a computer readable storage medium thereof for achieving accurate tracking of a small sample object according to the present invention are as follows:

the system for realizing the accurate tracking of the small sample target is mainly characterized by comprising the following components:

the human image acquisition module is used for acquiring human image view information from the video;

the human image access convergence module is connected with the human image acquisition module and used for accessing and forwarding human image view information and converging the acquired human image view information through the acquisition interface;

the portrait view library is connected with the portrait access convergence module and is used for storing portrait pictures, portrait video clips and passersby records;

the portrait analysis module is connected with the portrait view library and is used for providing portrait characteristic extraction and portrait 1:1 comparison, 1: n search, N: n clustering, N: m collision comparison retrieval service;

and the portrait application service supporting module is connected with the portrait view library and the portrait analysis module and is used for sending a task instruction to the portrait analysis module through the analysis interface and returning a result by the portrait analysis module and the portrait of the view library.

Preferably, the system further comprises an upper-level portrait view library and a lower-level portrait view library, wherein the upper-level portrait view library and the lower-level portrait view library are connected with each other through a cascade interface for networking, and are used for transmitting upper-level or lower-level portrait view information.

Preferably, the portrait view library receives and stores portrait view information sent by the portrait access convergence module and the portrait view analysis module; and the image picture is provided for the analysis module through the data service interface and is used for extracting the characteristic value of the image and providing the analysis result for the image application support module.

Preferably, the portrait view information includes a portrait video clip, a portrait picture and a portrait record.

The method for realizing the accurate tracking of the small sample target based on the system is mainly characterized by comprising the following steps of:

(1) The image acquisition module acquires image view information from the video through acquisition equipment and forwards the image view information to the image access convergence module;

(2) The portrait view library aggregates the portrait view information transmitted by the portrait access aggregation module or the portrait upper and lower level view library, and performs classified storage;

(3) The portrait application service supporting module inputs a portrait, and issues an instruction for searching a specific portrait or a place sequence where the portrait appears to the portrait analyzing module;

(4) The human image analysis module extracts human image characteristics from the human image view library through a pedestrian re-identification technology, and judges that a specific pedestrian is detected in an image or video sequence in the human image view library;

(5) And the portrait analysis module returns the position and time information of the snapshot camera to the portrait application service supporting module, and sends a command, and the portrait view library sends the matched images and videos to the portrait application service supporting module.

Preferably, the step (4) specifically includes the following steps:

(4.1) pedestrian detection is carried out, and a pedestrian image is obtained;

(4.2) cropping the pedestrian image;

(4.3) taking the cut pedestrian images in different vision fields as the input of a network, decomposing the images into different color channel subgraphs and processing the color channel subgraphs respectively;

(4.4) performing convolution filtering operation on each sub-image in the convolution layer to obtain the responses of different local image blocks as local features;

(4.5) combining all local features to form a feature map as an output of the convolutional layer;

(4.6) performing a max pooling operation or an average pooling operation on the generated feature map in the pooling layer;

(4.7) projecting the feature map obtained by the pooling layer to a one-dimensional feature space at the full-connection layer to form a feature vector of the pedestrian image;

(4.8) judging whether the input image pair inputs the same pedestrian or not through a binary function;

(4.9) extracting features aiming at the input image to obtain feature expression vectors for distinguishing different pedestrians;

and (4.10) carrying out similarity measurement according to the feature expression vector, and sequencing the images according to the similarity, wherein the image with the highest similarity is used as a recognition result.

The device for realizing the accurate tracking of the small sample target is mainly characterized by comprising the following components:

a processor configured to execute computer-executable instructions;

and a memory storing one or more computer-executable instructions which, when executed by the processor, perform the steps of the method for achieving accurate tracking of small sample targets described above.

The processor for realizing the accurate tracking of the small sample target is mainly characterized in that the processor is configured to execute computer executable instructions, and the computer executable instructions realize the steps of the method for realizing the accurate tracking of the small sample target when being executed by the processor.

The computer readable storage medium is characterized in that the computer program is stored thereon, and the computer program can be executed by a processor to realize the steps of the method for realizing the accurate tracking of the small sample target.

The system, the method, the device, the processor and the computer readable storage medium thereof for realizing the accurate tracking of the small sample target adopt the invention, adopt the pedestrian re-identification technology, adopt the human body to search the graph to follow the evolution thought from simple background to complex background, from no shielding to shielding, and from small change to big change. External interference caused by complex background and shielding is removed by means of image segmentation and the like, apparent differences caused by changes of visual angles, resolution, gestures, illumination and the like are removed by means of key point positioning and system robustness improvement, the problem that targets cannot be identified or are identified incorrectly when massive targets are identified, particularly under the condition of cross resources, the single-dimensional biological feature false alarm rate is reduced, and the method has a wider application range.

Drawings

Fig. 1 is a schematic diagram of the connection of the modules of the system for realizing the accurate tracking of the small sample target.

Detailed Description

In order to more clearly describe the technical contents of the present invention, a further description will be made below in connection with specific embodiments.

The system for realizing the accurate tracking of the small sample target comprises the following components:

As the preferred implementation mode of the invention, the system also comprises an upper-level and lower-level portrait view library which is connected with the portrait view library and is used for transmitting upper-level or lower-level portrait view information through cascade interface networking.

As a preferred implementation mode of the invention, the portrait view library receives and stores the portrait view information sent by the portrait access convergence module and the portrait view information sent by the portrait access analysis module; and the image picture is provided for the analysis module through the data service interface and is used for extracting the characteristic value of the image and providing the analysis result for the image application support module.

As a preferred embodiment of the present invention, the portrait view information includes a portrait video clip, a portrait picture, and a portrait record.

The method for realizing the accurate tracking of the small sample target based on the system comprises the following steps:

(4.1) pedestrian detection is carried out, and a pedestrian image is obtained;

(4.2) cropping the pedestrian image;

(4.4) performing convolution filtering operation on each sub-image in the convolution layer to obtain responses of different local image blocks, and performing

Is a local feature;

(4.10) carrying out similarity measurement according to the feature expression vector, and sequencing the images according to the similarity, wherein the image with the highest similarity is used as a recognition result;

As a preferred embodiment of the present invention, the apparatus for achieving accurate tracking of a small sample target includes:

a processor configured to execute computer-executable instructions;

As a preferred embodiment of the present invention, the processor for achieving accurate tracking of a small sample target is configured to execute computer-executable instructions that, when executed by the processor, implement the steps of the method for achieving accurate tracking of a small sample target described above.

As a preferred embodiment of the present invention, the computer-readable storage medium has stored thereon a computer program executable by a processor to perform the steps of the method for achieving accurate tracking of small sample targets described above.

In the specific embodiment of the invention, the invention provides a system which can be used for finishing the fine and rapid structured description of a sensitive human body target, solves the problems that targets cannot be identified or are identified incorrectly due to factors such as angles, distances and shielding by on-line modeling, cross-resource target identification, cross-domain target identification and other technologies of the sensitive target, and realizes efficient control and real-time pursuit based on video target rows.

The small sample target accurate tracking system, as shown in figure 1, comprises a portrait acquisition module, a portrait access convergence module, a portrait view library, a portrait analysis module, an upper and lower stage portrait view library and a portrait application service support module.

The portrait acquisition module is used for acquiring portrait view information from a video, and comprises the following steps: a portrait video clip, a portrait picture and a overperson record.

The human image access convergence module is used for accessing and forwarding human image view information, and the acquired human image view information is converged and forwarded to the human image view library through the acquisition interface.

The portrait view library mainly comprises functional modules such as interfaces, applications, management and the like, is used for storing portrait pictures, portrait video clips, passersby records and the like, and is divided into a person first file, a case-related portrait library, a file subject library, a distribution subject library, a passersby library and the like of portrait clustering from business and logic.

The portrait analyzing module provides portrait feature extraction and portrait 1:1 comparison, 1: n search, N: n clustering, N: m collision, etc. to the search service.

The portrait view library supports to receive and store the portrait view information sent by the portrait access convergence module and the analysis module.

The view library provides the portrait pictures for the analysis module through the data service interface for extracting portrait characteristic values and providing analysis results for the application support module.

The upper and lower portrait views are networked through a cascade interface.

The portrait application service supporting module sends a task instruction to the portrait analyzing module through the analysis interface, and the portrait analyzing module and the view library return results.

The method for realizing video target tracking by the small sample target accurate tracking system based on the system comprises the following steps:

(1) Connecting the portrait acquisition module with the portrait access convergence module;

(2) Connecting the portrait access convergence module with a portrait view library;

(3) The portrait view library is respectively connected with a portrait analysis module, an upper and lower stage portrait view library and a portrait application service supporting module;

(4) Connecting the portrait analyzing module with a portrait application service supporting module;

(5) The human image acquisition module acquires human image view information from the video through acquisition equipment and forwards the human image view information to the human image access convergence module;

(6) The portrait access convergence module or the portrait upper and lower-level view library converges and forwards the received portrait view information to the portrait view library for classified storage;

(7) The portrait application service supporting module inputs a portrait and issues an instruction for searching a specific portrait or a place sequence where the specific portrait appears to the portrait analyzing module;

(8) The human image analysis module adopts a pedestrian re-identification technology to extract human image characteristics from a human image view library and judges whether a specific pedestrian exists in an image or a video sequence in the view library;

(9) And the portrait analysis module returns the position and time information of the snapshot camera to the portrait application service supporting module, and sends a command to enable the portrait visual library to send the matched images and videos to the portrait application service supporting module.

The pedestrian re-identification technology is a method for removing external interference caused by complex background and shielding by means of image segmentation and the like and apparent difference caused by changes of visual angle, resolution, gesture, illumination and the like by positioning key points and improving system robustness according to an evolution thought that a human body searches a map from a simple background to a complex background, from no shielding to shielding and from small change to large change, and specifically comprises the following steps of:

(1) Pedestrian detection is carried out, and a pedestrian image is obtained;

(2) Cutting out a pedestrian image;

(3) The method comprises the steps of taking cut pedestrian images in different vision fields as network input, decomposing the images into different color channel subgraphs, and processing the color channel subgraphs respectively;

(4) For each sub-image, performing a convolution filtering operation on the sub-image in a convolution layer to obtain responses of different local image blocks,

as a local feature;

(5) Combining all local features to form a feature map as an output of the convolution layer;

(6) Carrying out maximum/average pooling operation on the generated feature images in a pooling layer, thereby greatly reducing the occurrence of training parameters and overfitting phenomenon;

(7) The convolution layer and the pooling layer can appear for a plurality of times to obtain abstract and multi-level description of pedestrians;

(8) Projecting the feature map obtained by the pooling layer to a one-dimensional feature space at the full-connection layer to form a feature vector of the pedestrian image;

(9) And a final Softmax layer, judging whether the input image pair inputs the same pedestrian or not through a binary function;

(10) Extracting stable and robust features from an input image to obtain feature expression vectors capable of describing and distinguishing different pedestrians;

(11) And finally, carrying out similarity measurement according to the feature expression vector, and sequencing the images according to the similarity, wherein the image with the highest similarity is used as a final recognition result.

The invention realizes the recognition, search and discovery of sensitive targets in typical application environments and across scenes, and solves the problem that targets cannot be recognized and tracked when the background is complex due to the shielding of the targets by a pedestrian re-recognition technology, thereby realizing the automatic association of event targets and improving the police application efficiency of mass videos of public security authorities.

The specific implementation manner of this embodiment may be referred to the related description in the foregoing embodiment, which is not repeated herein.

It is to be understood that the same or similar parts in the above embodiments may be referred to each other, and that in some embodiments, the same or similar parts in other embodiments may be referred to.

It should be noted that in the description of the present invention, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Furthermore, in the description of the present invention, unless otherwise indicated, the meaning of "plurality" means at least two.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and further implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.

It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution device. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

Those of ordinary skill in the art will appreciate that all or part of the steps carried out in the method of the above embodiments may be implemented by a program to instruct related hardware, and the corresponding program may be stored in a computer readable storage medium, where the program when executed includes one or a combination of the steps of the method embodiments.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented as software functional modules and sold or used as a stand-alone product.

The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the invention.

In this specification, the invention has been described with reference to specific embodiments thereof. It will be apparent, however, that various modifications and changes may be made without departing from the spirit and scope of the invention. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.

Claims

1. A system for achieving accurate tracking of a small sample target, said system comprising:

2. The system for realizing accurate tracking of a small sample target according to claim 1, further comprising an upper and a lower stage portrait view libraries connected with the portrait view libraries and networked through a cascade interface for transmitting upper stage or lower stage portrait view information.

3. The system for realizing accurate tracking of a small sample target according to claim 1, wherein the portrait view library receives and stores portrait view information sent by a portrait access convergence module and a portrait analysis module; and the image picture is provided for the analysis module through the data service interface and is used for extracting the characteristic value of the image and providing the analysis result for the image application support module.

4. The system for achieving accurate tracking of small sample objects according to claim 1, wherein said portrait view information includes portrait video clips, portrait pictures, and overperson recordings.

5. A method for achieving accurate tracking of small sample targets based on the system of claim 1, comprising the steps of:

6. The method for realizing accurate tracking of a small sample object according to claim 5, wherein the step (4) specifically comprises the steps of:

(4.1) pedestrian detection is carried out, and a pedestrian image is obtained;

(4.2) cropping the pedestrian image;

7. An apparatus for achieving accurate tracking of a small sample target, said apparatus comprising:

a processor configured to execute computer-executable instructions;

a memory storing one or more computer-executable instructions which, when executed by the processor, perform the steps of the method of achieving accurate tracking of small sample targets of claim 5 or 6.

8. A processor for achieving accurate tracking of small sample targets, wherein the processor is configured to execute computer-executable instructions that, when executed by the processor, implement the steps of the method for achieving accurate tracking of small sample targets of claim 5 or 6.

9. A computer readable storage medium having stored thereon a computer program executable by a processor to perform the steps of the method of achieving accurate tracking of small sample targets of claim 5 or 6.