CN111401269B

CN111401269B - Commodity hot spot detection method, device and equipment based on monitoring video

Info

Publication number: CN111401269B
Application number: CN202010196433.4A
Authority: CN
Inventors: 杨淼
Original assignee: Chengdu Yunstare Technology Co ltd
Current assignee: Chengdu Yunstare Technology Co ltd
Priority date: 2020-03-19
Filing date: 2020-03-19
Publication date: 2023-07-14
Anticipated expiration: 2040-03-19
Also published as: CN111401269A

Abstract

The application relates to a commodity hot spot detection method, device and equipment based on a monitoring video. The method comprises the following steps: extracting video images frame by frame from real-time monitoring video of commodities; carrying out personnel access detection on each frame of video image in sequence; if the person is detected to enter the detection area, generating a differential gradient image and a binary characteristic image of the detection area based on the current video image and a preset background image; if the number of the pixel points with the pixel values not being 0 in the binary characteristic image is larger than a preset first threshold value, taking the differential gradient image as a detection image; based on the detection image, counting the number of hot spots of different commodities by an information clustering method; and displaying the hot spot times in the real-time monitoring video. According to the technical scheme, the number of times of hot points and the thermodynamic diagram of the commodity in the detection area can be output in real time under the complex store environment, and special requirements on the commodity placement angle are avoided, so that the workload of store operators and management staff can be greatly reduced.

Description

Commodity hot spot detection method, device and equipment based on monitoring video

Technical Field

The application relates to the technical field of computer vision, in particular to a commodity hot spot detection method, device and equipment based on a monitoring video.

Background

At present, in the retail store of various commodity, can set up the camera and monitor the commodity of selling generally, firstly in order to avoid taking place the theft loss, second place merchant also can know which commodity is more concerned through the surveillance video, and what is the hot spot commodity promptly to promote the daily management of store, commodity marketing tactics and the precision of preparation scheme.

For the second application, there are several different implementation methods at present, namely, a manual statistics method, that is, the monitor video is continuously watched by naked eyes, which has the disadvantage of higher labor cost; and secondly, an object detection method based on images needs to identify the types of detected commodities, and the detection model needs to be iterated continuously because of the variety of commodities, so that the model is extremely huge, the storage difficulty is high, the algorithm calculation amount is large, and the calculation time is long.

Disclosure of Invention

The application provides a commodity hot spot detection method, device and equipment based on a monitoring video, which are used for solving the problems of high labor cost or complex detection process of the existing detection method for the commodity hot spot condition.

The above object of the present application is achieved by the following technical solutions:

in a first aspect, an embodiment of the present application provides a method for detecting a commodity hot spot based on a surveillance video, including:

extracting video images frame by frame from real-time monitoring video of commodities;

sequentially carrying out personnel access detection on each frame of video image so as to detect whether personnel enter a preset detection area; wherein the detection area is the area where the commodity is located;

if the person is detected to enter the detection area, generating a differential gradient image and a binary characteristic image of the detection area based on the current video image and a preset background image; the preset background image is an initial background image set by a user or an image obtained by performing orthogonal projection transformation on the initial background image;

if the number of the pixel points with the pixel values not being 0 in the binary characteristic image is larger than a preset first threshold value, taking the differential gradient image as a detection image;

based on the detection image, counting the number of hot spots of different commodities by an information clustering method;

and displaying the hot spot times in the real-time monitoring video.

Optionally, before extracting the video image frame by frame from the real-time monitoring video of the commodity, the method further includes:

Acquiring an initial background image set by a user, and acquiring a detection area set by the user based on the initial background image;

based on a user instruction, if the user selects to perform orthogonal projection transformation, calculating to obtain a projection transformation matrix according to the pixel coordinates of the detection area, and performing orthogonal projection transformation on an initial background image set by the user by using the projection transformation matrix to serve as the preset background image; and if the user selects not to perform orthogonal projection transformation, taking the initial background image as the preset background image.

Optionally, the step of sequentially performing personnel access detection on each frame of video image to detect whether a person enters a preset detection area includes:

sequentially carrying out foreground detection on the detection area of each frame of video image, and calculating to obtain a foreground pixel duty ratio;

if the foreground pixel duty ratio is larger than a preset second threshold value, counting the number of frames of high foreground images in continuous images of a first preset number of frames; wherein the high foreground image is an image with a foreground pixel ratio greater than the second threshold;

if the frame number of the high foreground image is larger than a preset third threshold value, judging whether the foreground pixel ratio of continuous images with at least a second preset number of frames continuously becomes larger in the high foreground image; wherein the second preset number is less than or equal to the first preset number;

If the foreground pixel ratio of the continuous images with at least the second preset number of frames continuously increases, determining that a person enters a preset detection area.

Optionally, the generating the differential gradient image and the binary feature image of the detection area based on the current video image and the preset background image includes:

generating a current gradient image and a current salient image of the detection area from the current video image; if the preset background image is an image subjected to orthogonal projection transformation, the current video image is subjected to orthogonal projection transformation, and then the current gradient image and the current significant image are generated;

performing difference on the basis of the current gradient image and a template gradient image which is generated in advance by the background image to obtain a difference gradient image;

based on a preset self-adaptive threshold value, binarizing the current gradient image to obtain a binary gradient image, and binarizing the current significant image to obtain a binary significant image;

and calculating to obtain a binary characteristic image based on the binary gradient image and the binary significant image.

Optionally, if the number of pixels of the binary feature image is greater than a preset first threshold, after taking the differential gradient image as the detection image, the method further includes:

Updating the template gradient image, and taking the current gradient image as an updated template gradient image.

Optionally, the counting, based on the detected image, the number of hot spots of different commodities by using an information clustering method includes:

executing a sliding window algorithm on the current frame detection image;

sequentially judging whether the number of the pixel points with the pixel values not being 0 in each sliding window is larger than 0, and if the number is larger than 0, generating a primary detection frame containing all the pixel points with the pixel values not being 0 in the current sliding window;

based on the size information and the coincidence degree information of the detection frames, all the first-level detection frames in the current frame detection image are adjacently combined to obtain a plurality of second-level detection frames;

performing coordinate limitation on each secondary detection frame to obtain a plurality of tertiary detection frames with coordinates limited in a preset coordinate threshold range;

based on the size information and the coincidence degree information of the detection frames, carrying out cluster fusion on each three-level detection frame in the current frame detection image and the final-level detection frame of the previous frame detection image to obtain the final-level detection frame in the current frame detection image; every time the clustering fusion is completed, the number of hot spots of the commodity in the corresponding position is increased by 1;

Wherein, the detection frames at all levels are rectangular in shape.

Optionally, the method further comprises:

and performing de-duplication and information verification on the final detection frame in the current frame detection image, thereby improving the accuracy of the obtained commodity hot spot times.

Optionally, the displaying the hot spot times in the real-time monitoring video includes:

and displaying the hot spot times in a real-time monitoring video in a thermodynamic diagram or digital form.

In a second aspect, an embodiment of the present application further provides a commodity hot spot detection device based on a monitoring video, including:

the extraction module is used for extracting video images frame by frame from the real-time monitoring video of the commodity;

the detection module is used for sequentially carrying out personnel access detection on each frame of video image so as to detect whether personnel enter a preset detection area; wherein the detection area is the area where the commodity is located;

the generation module is used for generating a differential gradient image and a binary characteristic image of the detection area based on the current video image and a preset background image if the person is detected to enter the detection area; the preset background image is an initial background image set by a user or an image obtained by performing orthogonal projection transformation on the initial background image;

The setting module is used for taking the differential gradient image as a detection image if the number of the pixel points with the pixel values not being 0 in the binary characteristic image is larger than a preset first threshold value;

the statistics module is used for counting the number of hot spots of different commodities by an information clustering method based on the detection image;

and the display module is used for displaying the hot spot times in the real-time monitoring video.

In a third aspect, an embodiment of the present application further provides a commodity hot spot detection device based on a surveillance video, including:

a memory and a processor coupled to the memory;

the storage is used for storing a program, and the program is at least used for executing the commodity hot spot detection method based on the monitoring video;

the processor is used for calling and executing the program stored in the memory.

The technical scheme provided by the embodiment of the application can comprise the following beneficial effects:

according to the technical scheme provided by the embodiment of the application, the monitoring video of the commodity is processed and analyzed, so that the number of times of touching the commodity can be identified based on a clustering statistics method, and the number of hot spots of the commodity is determined; before clustering statistics, whether a detection area is set by a person is determined, if so, the detection of the commodity hot spot times is performed, so that detection errors caused by light change, camera shake and other reasons can be avoided, and the detection accuracy is improved. Therefore, compared with the traditional method, the technical scheme provided by the application can output the number of times of the commodity hot spots and the related thermodynamic diagram in the detection area in real time under the complex store environment, and has no special requirement on the commodity placement angle, so that the workload of store operators and management staff can be greatly reduced.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.

Fig. 1 is a schematic flow chart of a commodity hot spot detection method based on a surveillance video according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of a commodity hot spot detection device based on a monitoring video according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a commodity hot spot detection device based on a surveillance video according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a commodity hot spot detection system based on a surveillance video according to an embodiment of the present application;

FIG. 5 is a schematic workflow diagram of a configuration module of the system of FIG. 4;

FIG. 6 is a schematic workflow diagram of an initialization module of the system of FIG. 4;

FIG. 7 is a schematic diagram of a workflow of the system personnel access detection module of FIG. 4;

FIG. 8 is a schematic workflow diagram of a detection image generation module of the system of FIG. 4;

FIG. 9 is a schematic workflow diagram of a cluster statistics module of the system of FIG. 4;

Fig. 10 is a schematic workflow diagram of an output module of the system of fig. 4.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present application as detailed in the accompanying claims.

Examples

Referring to fig. 1, fig. 1 is a flow chart of a method for detecting a commodity hot spot based on a monitoring video according to an embodiment of the present application. As shown in fig. 1, the method comprises the steps of:

s101: extracting video images frame by frame from real-time monitoring video of commodities;

in some embodiments, prior to S101, the method further comprises: acquiring an initial background image set by a user, and acquiring a detection area set by the user based on the initial background image; based on a user instruction, if the user selects to perform orthogonal projection transformation, calculating to obtain a projection transformation matrix according to the pixel coordinates of the detection area, and performing orthogonal projection transformation on an initial background image set by the user by using the projection transformation matrix to serve as the preset background image; and if the user selects not to perform orthogonal projection transformation, taking the initial background image as the preset background image.

That is, if the method is applied for the first time for detection, it is necessary for the user to set an initial background image first and set a detection area within the initial background image. The initial background image is an image of the same light condition selected from the monitoring video (real-time or historical) as the image of the condition of no person in the actual detection, the detection area is an area where the commodity to be detected is located, for example, if the condition of the commodity placed on a table and the periphery of the table is monitored by the camera, the area where the whole table is located can be set as a detection area, the monitoring picture outside the range of the table is a non-detection area, and the monitoring picture change of the non-detection area is not considered in the detection and analysis of the subsequent steps.

In addition, the orthographic transformation is to project a three-dimensional scene onto a two-dimensional image, and to keep the relative distance between objects unchanged after transformation. In specific implementation, first, the RoI coordinates, src coordinates and Dst coordinates are generated by the input detection region coordinates, and the coordinates are arranged according to the upper left, upper right, lower right and lower left except the RoI coordinates, and the calculation formula is as follows:

wherein the RoI coordinates:

RoI_x＝min(x ₀ ,x ₁ ,x ₂ ,x ₃ )

RoI_y＝min(y ₀ ,y ₁ ,y ₂ ,y ₃ )

RoI_w＝max(x ₀ ,x ₁ ,x ₂ ,x ₃ )-min(x ₀ ,x ₁ ,x ₂ ,x ₃ )

RoI_h＝max(y ₀ ,y ₁ ,y ₂ ,y ₃ )-min(y ₀ ,y ₁ ,y ₂ ,y ₃ )

src coordinates:

Src_x _i ＝x _i -RoI_x

Src_y _i ＝y _i -RoI_y

Dst coordinates:

Dst_x _i ＝(0,RoI_w,RoI_w,0)

Dst_y _i ＝(0,RoI_h,RoI_h,0)

subsequently, the Src coordinates and Dst coordinates are combined, and the projective transformation matrix M is solved by svd (Singular Value Decomposition ) algorithm:

the projective transformation matrix M is also used again in a subsequent step.

S102: sequentially carrying out personnel access detection on each frame of video image so as to detect whether personnel enter a preset detection area;

the purpose of this step is to improve the accuracy of counting the number of hot spots of the product in the subsequent step, i.e. the subsequent step is performed only if a person is detected to enter the detection area.

In some embodiments, the process of personnel access detection specifically includes:

sequentially carrying out foreground detection on the detection area of each frame of video image, and calculating to obtain a foreground pixel duty ratio; if the foreground pixel duty ratio is larger than a preset second threshold value, counting the number of frames of high foreground images in continuous images of a first preset number of frames; wherein the high foreground image is an image with a foreground pixel ratio greater than the second threshold; if the frame number of the high foreground image is larger than a preset third threshold value, judging whether the foreground pixel ratio of continuous images with at least a second preset number of frames continuously becomes larger in the high foreground image; wherein the second preset number is less than or equal to the first preset number; if the foreground pixel ratio of the continuous images with at least the second preset number of frames continuously increases, determining that a person enters a preset detection area.

Specifically, the foreground is opposite to the background, and the foreground detection, that is, the detection of the number and coordinates of foreground pixels, can be realized by using a ViBe algorithm. The ratio of the foreground pixel to the total pixel number in the detection area is determined, if the ratio exceeds a set second threshold (for example, 30%), then the detection area of the frame image is considered to be "possibly people", at this time, the judgment is continued, the continuous n frames (the first preset number of frames) of images containing the frame image are selected, how many frames of high foreground images are contained in the n frames of images are counted, and if the frame number of the high foreground images exceeds a set third threshold (for example, n is 80%), then the "people exist in the detection area" is confirmed; otherwise, the reason why the foreground pixel ratio exceeds the threshold value may be a detection error caused by light change or the like when "no person is in the detection area" or "no person is in the detection area". After confirming that a person exists in the detection area, further judging whether the person enters from the outside of the detection area, namely, whether the foreground pixel ratio of the continuous m frames (the second preset number of frames) of images continuously increases in all the high foreground images obtained before detection, if so, finally confirming that the person enters the detection area, and continuously executing the next step at the moment, and if other judging results are obtained, continuously carrying out the person entering and exiting detection until the result of the person entering the detection area is obtained.

For example, if a customer walks into the detection area from outside the detection area and touches a commodity, the customer leaves the detection area after finishing the inspection. Then, in the process, theoretically, the foreground pixel duty ratio is initially 0, and as the customer walks into the detection area, the foreground pixel duty ratio is first gradually increased to the set threshold value, then gradually increased to the maximum value (when the foreground pixel duty ratio is the maximum, it can be regarded that the customer completely enters the detection area), and when the customer looks at the commodity, the foreground pixel duty ratio fluctuates around the maximum value (because of the posture change of the person, etc., the foreground pixel duty ratio also changes), and when the customer leaves, the foreground pixel duty ratio is gradually decreased to the set threshold value, then decreased to 0.

S103: if the person is detected to enter the detection area, generating a differential gradient image and a binary characteristic image of the detection area based on the current video image and a preset background image; the preset background image is an initial background image set by a user or an image obtained by performing orthogonal projection transformation on the initial background image;

in some embodiments, the specific process of generating the differential gradient image and the binary feature image includes:

Generating a current gradient image and a current salient image of the detection area from the current video image; if the preset background image is an image subjected to orthogonal projection transformation, the current video image is subjected to orthogonal projection transformation, and then the current gradient image and the current significant image are generated; performing difference on the basis of the current gradient image and a template gradient image which is generated in advance by the background image to obtain a difference gradient image; based on a preset self-adaptive threshold value, binarizing the current gradient image to obtain a binary gradient image, and binarizing the current significant image to obtain a binary significant image; and calculating to obtain a binary characteristic image based on the binary gradient image and the binary significant image.

Specifically, the gradient of an image can be seen as a two-dimensional discrete function, and the fact is the derivative of this two-dimensional discrete function. A salient image is an image that displays the uniqueness of each pixel, with the aim of simplifying or changing the representation of a general image into a pattern that is easier to analyze. In this embodiment, a Sobel detection algorithm and an AC algorithm may be used for the detection image to obtain a gradient image and a significant image, which are both algorithms commonly used in the prior art, and the specific calculation process thereof is not described in detail. It is only to be noted that if the preset background image is subjected to orthogonal projection transformation, the current video image is required to be subjected to orthogonal projection transformation as well, and then the current gradient image and the current salient image are acquired.

In addition, the calculation formula of the preset adaptive threshold is as follows:

wherein Area is the detection Area, α ₀ ，α ₁ ，β ₀ ，β ₁ ，γ ₀ ，γ ₁ C is a self-defined calculation parameter, and the value of c depends on the actual detection environment.

Binarization means that pixels of an image are converted to only two values of 0 (black) and 255 (white) (which may also be set to 0 and 1), so that the whole image exhibits a remarkable black-and-white effect.

Based on the adaptive threshold, the binary gradient image can be expressed as:

where x is the actual pixel value, the above formula indicates that when the actual pixel value is greater than the adaptive threshold, the binarization result is 1 (white), otherwise the binarization result is 0 (black).

Similarly, the binary saliency image SalientImage can be expressed as:

further, the binary feature image FeatureImage can be expressed as:

FeatureIma ge(i,j)＝BinaryIma ge(i,j)×SalientIma ge(i,j)

where (i, j) is the coordinates of the pixel point, i.e., the ith row and the jth column.

S104: if the number of the pixel points with the pixel values not being 0 in the binary characteristic image is larger than a preset first threshold value, taking the differential gradient image as a detection image;

specifically, based on the calculation result featureinformation, the number of non-0 pixels is counted, and if the number is greater than a set threshold value, the differential gradient image calculated in the above step is used as a detection image.

In addition, in some embodiments, the current gradient image calculated in the above steps may be used as an updated template gradient image, so as to improve the detection accuracy of the subsequent other frame images.

S105: based on the detection image, counting the number of hot spots of different commodities by an information clustering method;

in some embodiments, the specific process of this step includes:

executing a sliding window algorithm on the current frame detection image; sequentially judging whether the number of the pixel points with the pixel values not being 0 in each sliding window is larger than 0, and if the number is larger than 0, generating a primary detection frame containing all the pixel points with the pixel values not being 0 in the current sliding window; based on the size information and the coincidence degree information of the detection frames, all the first-level detection frames in the current frame detection image are adjacently combined to obtain a plurality of second-level detection frames; performing coordinate limitation on each secondary detection frame to obtain a plurality of tertiary detection frames with coordinates limited in a preset coordinate threshold range; based on the size information and the coincidence degree information of the detection frames, carrying out cluster fusion on each three-level detection frame in the current frame detection image and the final-level detection frame of the previous frame detection image to obtain the final-level detection frame in the current frame detection image; and adding 1 to the number of hot spots representing commodities in corresponding positions after each clustering fusion, wherein each level of detection frame is rectangular in shape.

Specifically, the generated first-level detection frame refers to the smallest rectangular frame containing all pixel points with the pixel value of not 0.

The size information of the detection frame refers to the value of the width and the height (or the area) of the rectangular detection frame, and if the value is smaller than a preset value, the value is combined by other detection frames; the coincidence degree information refers to the proportion of the coincidence area of the two (or more) detection frames to the total area, and if the coincidence area is higher than a preset value, the coincidence is carried out. The process of performing the adjacent merging can be expressed as:

in the Rect _a And Rect _b The coordinates of the detection frame a and the detection frame b are respectively information, and the coordinates are composed of three pieces of information, namely the coordinates (i, j) of the upper left corner pixel, the Width of the detection frame and the height.

In addition, the coordinate limitation refers to limiting the width and the height of the secondary detection frame obtained in the steps within a preset maximum height range. The process of cluster fusion is similar to the above-described process of adjacent merging, and can also be expressed by the above-described formula. And adding 1 to the number of hot spots of the commodity representing the corresponding position every time the clustering fusion is completed.

In some embodiments, the final detection frame in the current frame detection image can be subjected to de-duplication and information verification, so that the accuracy of the obtained commodity hot spot times is improved. The de-duplication is to exclude the situation of multiple statistics of the same touch, and the information verification is to verify whether the information of the detection frame after de-duplication is correct.

S106: and displaying the hot spot times in the real-time monitoring video.

If the orthogonal projective transformation is performed in the above steps, it is necessary to calculate an inverse matrix of the projective transformation matrix and inverse-transform the detection frame obtained in the above steps by using the inverse matrix.

The number of hotspots may then be displayed in a real-time surveillance video in thermodynamic diagram or digital form. The thermodynamic diagram is in the form of representing different hot spot times by different colors (or shades of colors), and the numerical form is in the form of indicating corresponding hot spot times by specific numbers.

In order to more fully explain the technical scheme of the application, the embodiment of the application further provides a commodity hot spot detection device based on the monitoring video, which corresponds to the commodity hot spot detection method based on the monitoring video provided by the embodiment of the application.

Referring to fig. 2, fig. 2 is a schematic structural diagram of a commodity hot spot detection device based on a monitoring video according to an embodiment of the present application. As shown in fig. 2, the apparatus includes:

an extracting module 21, configured to extract video images from real-time monitoring video of the commodity frame by frame;

the detection module 22 is configured to sequentially perform personnel access detection on each frame of video image, so as to detect whether a person enters a preset detection area; wherein the detection area is the area where the commodity is located;

the generating module 23 is configured to generate a differential gradient image and a binary feature image of the detection area based on the current video image and a preset background image if it is detected that a person enters the detection area; the preset background image is an initial background image set by a user or an image obtained by performing orthogonal projection transformation on the initial background image;

the setting module 24 is configured to take the differential gradient image as a detection image if the number of pixel points with non-0 pixel values in the binary feature image is greater than a preset first threshold;

The statistics module 25 is configured to count the number of hot spots of different commodities by using an information clustering method based on the detected image;

and the display module 26 is used for displaying the hot spot times in the real-time monitoring video.

Specifically, the specific implementation manner of the function of each functional module may be implemented with reference to the content in the monitoring video-based commodity hot spot detection method, which will not be described in detail.

Referring to fig. 3, fig. 3 is a schematic structural diagram of a commodity hot spot detection apparatus based on a monitoring video according to an embodiment of the present application. As shown in fig. 3, the apparatus includes:

a memory 31 and a processor 32 connected to the memory 31;

the memory 31 is configured to store a program for performing at least the above-described surveillance video-based commodity hot spot detection method;

the processor 32 is used to call and execute the program stored in the memory 31.

Specifically, the device may be a computer or a similar independent device, or may be directly integrated into an edge device such as a monitoring camera, where the specific implementation of the function of the program may be implemented by referring to the content in the above-mentioned merchandise hotspot detection method based on the monitoring video, which will not be described in detail.

The foregoing is a general description of the technical solutions of the present application, and for the convenience of the skilled person, the following description will be given by way of a specific example.

Referring to fig. 4-10, fig. 4 is a schematic structural diagram of a merchandise hot spot detection system based on a surveillance video according to an embodiment of the present application, and fig. 5-10 are schematic working flow diagrams of each module of the system shown in fig. 4.

As shown in fig. 4, the system includes: a configuration module 41, an initialization module 42, a personnel access detection module 43, a detection image generation module 44, a cluster statistics module 45 and an output module 46.

The configuration module 41 is mainly configured to set a detection area and determine the rationality of its coordinates, and the workflow of the configuration module is shown in fig. 5, and includes:

the input of background images, namely images under the condition that the light condition selected by a user is the same as that of the detection in practice and under the condition of no person;

setting a detection area, namely setting an area which needs to be detected and contains commodities by a user;

the rationality judgment and storage of the detection area coordinates are carried out, wherein the rationality judgment method is to limit according to the size of the background image, the maximum value of the detection area length and width coordinates is the maximum value of the background image length and width, and the minimum value is 0;

the user selects whether orthogonal projection transformation is carried out, if so, a projection transformation matrix is calculated according to the detection area coordinates, and matrix parameters are stored; if not, ending the process of the module.

The initialization module 42 is mainly used for performing relevant initialization on each required algorithm, and the workflow of the initialization module is shown in fig. 6, and includes:

the background image is transmitted in and is stored as a pbMask;

reading the detection area coordinates in the configuration module 41;

adaptive threshold calculation;

initializing each storage space;

initialization of the personnel access detection module 43;

combining the pbMask and the projective transformation matrix in the configuration module 41, if projective transformation is selected, performing orthogonal projective transformation on the pbMask by using the projective transformation matrix, otherwise, performing no transformation, and then respectively adopting an AC algorithm and a Sobel detection algorithm to construct an initialized salient image Salientmask and a template gradient image Gradientmask based on the generated pbMask.

The personnel access detection module 43 is mainly used for detecting personnel access, thereby improving the hot spot statistics accuracy, and the working flow thereof is shown in fig. 7, and comprises:

obtaining a detection area image based on a frame to be detected;

using a ViBe algorithm to the detection area image to perform foreground detection;

calculating a foreground pixel ratio front ratio;

judging the pixel duty ratio, if the pixel duty ratio is larger than the threshold value, continuing, otherwise, transmitting the next frame of image;

counting the number nCnt of image frames with high foreground pixel ratio in the nearest n frames of images, and increasing the statistical variation parameters nParm, wherein the significance of the statistical variation parameters nParm is that the situation that the foreground variation rate of continuous m frames is high is recorded, and when the situation occurs, 1 is added, wherein m is less than or equal to n; when nCnt is greater than a threshold, if npram is greater than a set threshold, the method is considered to be in, otherwise the method is considered to be out; when nCnt is smaller than the threshold value, setting a statistical variation parameter nprm to be 0;

Setting a personnel access detection zone bit bPeopleout so as to carry out access judgment, wherein bPeopleout comprises two parameters True and False, true represents access, and False represents access.

The detection image generating module 44 is mainly configured to generate a specific detection image, and the workflow of the detection image generating module is shown in fig. 8, and includes:

the current frame detection area image is transmitted, whether projection transformation is carried out or not is selected, and if yes, projection transformation is carried out on a projection transformation matrix transmitted into the configuration module 41;

the current frame detection gradient image and the salient image are generated, and the template gradient image generated by the initialization module 42 is combined to generate a current frame differential gradient image;

performing self-adaptive threshold binarization on the current frame differential gradient image, and combining the binary salient image to obtain a binary characteristic image;

counting bCNt and judging a statistical threshold value of the number of non-0 pixels of the binary characteristic image, and if the number of the non-0 pixels is larger than the threshold value, setting a differential gradient image as a detection image;

the Gradientmask is updated and the current gradient image is set to Gradientmask.

The cluster statistics module 45 is mainly used for counting the number of hot spots by using an information clustering method, and the workflow of the cluster statistics module is shown in fig. 9 and comprises:

using a sliding window algorithm to the detected image to obtain the maximum value of pixels in the current sliding window, counting the number of non-0 pixels in the current sliding window, generating a detection frame DetBuxNow with the size of the sliding window according to the counted number if the number is larger than 0, setting the size information nType of the detection frame as 1, setting the current zone bit nFlag of the detection frame as 0, and recording the maximum value of the pixels in the current sliding window as the maximum touch times nTouch in the detection frame, otherwise, carrying out the next sliding window; wherein, the detection frame size information nType is 1, which indicates that the size information (i.e. width and height) of the detection frame is variable in the subsequent iteration process, for example, the detection frame may be larger after being combined with other detection frames, and if nType is 0, the detection frame size information nType is not variable; the current flag bit nFlag of the detection frame being 0 indicates that the flag bit (the pixel coordinate in the upper left corner) of the detection frame is not variable in the subsequent iteration process, and if nFlag is 1, the flag bit is variable;

Initializing and outputting the information of a detection frame DetBOXOOut;

iterating the DetBuxNow, combining the size information and the coincidence degree information of the detection frames, merging adjacent frames, and updating corresponding information to obtain DetBuxMerge;

detecting area coordinate limitation is carried out on the DetBuxMerge;

combining the previous frame of detection frame DetBuxPre and related information thereof, applying detection frame size information to fuse DetBuxMerge, clustering information, updating the maximum touch times, and setting the detection frame size information nType as 0 and the current flag bit nFlag of the detection frame as 1;

de-duplicating the detection frame;

checking the detection frame information, ensuring the information to be correct, storing the information into an output detection frame DetBuxOut, and updating DetBuxPre, namely taking DetBuxNow as DetBuxPre.

The output module 46 is mainly configured to obtain the detection frame information generated by the cluster statistics module 45, and draw a corresponding thermal image, and its workflow is shown in fig. 10, and includes:

projection verification, if orthogonal projection transformation is used, a projection transformation matrix which is transmitted into the configuration module 41 is needed, an inverse matrix of the matrix is calculated, and coordinate inverse projection is carried out on a detection frame set DetBOXOT;

and performing thermodynamic diagram drawing and hot spot number statistical image output.

It is to be understood that the same or similar parts in the above embodiments may be referred to each other, and that in some embodiments, the same or similar parts in other embodiments may be referred to.

It should be noted that in the description of the present application, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Furthermore, in the description of the present application, unless otherwise indicated, the meaning of "plurality" means at least two.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and further implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present application.

It is to be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, and where the program, when executed, includes one or a combination of the steps of the method embodiments.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product.

The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Although embodiments of the present application have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the application, and that variations, modifications, alternatives, and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the application.

Claims

1. The commodity hot spot detection method based on the monitoring video is characterized by comprising the following steps of:

displaying the hot spot times in a real-time monitoring video;

before extracting the video image frame by frame from the real-time monitoring video of the commodity, the method further comprises the following steps:

based on a user instruction, if the user selects to perform orthogonal projection transformation, calculating to obtain a projection transformation matrix according to the pixel coordinates of the detection area, and performing orthogonal projection transformation on an initial background image set by the user by using the projection transformation matrix to serve as the preset background image; if the user selects not to perform orthogonal projection transformation, taking the initial background image as the preset background image;

the step of sequentially carrying out personnel access detection on each frame of video image to detect whether personnel enter a preset detection area or not comprises the following steps:

if the foreground pixel ratio of the continuous images with at least the second preset number of frames continuously increases, determining that a person enters a preset detection area;

generating a differential gradient image and a binary characteristic image of the detection area based on the current video image and a preset background image, wherein the method comprises the following steps:

Based on the binary gradient image and the binary salient image, calculating to obtain a binary characteristic image;

based on the detection image, counting the number of hot spots of different commodities by an information clustering method, wherein the method comprises the following steps:

executing a sliding window algorithm on the current frame detection image;

wherein, the detection frames at all levels are rectangular in shape.

2. The method according to claim 1, wherein if the number of pixels of the binary feature image is greater than a preset first threshold, after taking the differential gradient image as the detection image, further comprising:

3. The method according to claim 1, wherein the method further comprises:

4. The method of claim 1, wherein displaying the number of hotspots in a real-time surveillance video comprises:

5. Commodity hot spot detection device based on surveillance video, characterized by comprising:

Based on a user instruction, if the user selects to perform orthogonal projection transformation, calculating to obtain a projection transformation matrix according to the pixel coordinates of the detection area, and performing orthogonal projection transformation on an initial background image set by the user by using the projection transformation matrix to serve as a preset background image; if the user chooses not to do orthogonal projection transformation, the initial background image is used as the preset background image

the generating the differential gradient image and the binary characteristic image of the detection area based on the current video image and the preset background image comprises the following steps:

based on the binary gradient image and the binary salient image, calculating to obtain a binary characteristic image

executing a sliding window algorithm on the current frame detection image;

wherein, the detection frames at all levels are rectangular;

6. Commodity hot spot detection equipment based on surveillance video, characterized by comprising:

a memory and a processor coupled to the memory;

the memory for storing a program for at least performing the surveillance video-based commodity hot spot detection method according to any one of claims 1 to 4;