CN116800979B

CN116800979B - Point cloud video coding method based on inter-frame implicit correlation

Info

Publication number: CN116800979B
Application number: CN202310865197.4A
Authority: CN
Inventors: 赵东; 马华东; 王义总; 高腾; 郭子玄; 黄成豪
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2023-07-14
Filing date: 2023-07-14
Publication date: 2025-05-27
Anticipated expiration: 2043-07-14
Also published as: CN116800979A

Abstract

A point cloud video coding method based on inter-frame implicit correlation relates to the field of point cloud video compression and comprises the steps of firstly, performing motion compensation with minimized entropy, generating reference frames by adopting a motion compensation method, enabling the reference frames to align with topological structures in the inter-frame implicit correlation and simultaneously minimizing conditional entropy, dividing the frames into small cubes, evaluating matching degree between the two cubes by using an index, searching cubes with optimal matching degree from cubes of a previous frame for each cube of a current frame, splicing each optimal matching cube, generating the reference frames, selecting the reference frames capable of minimizing the conditional entropy as motion compensation output with minimized entropy, and performing inter-frame entropy coding. The invention fully utilizes the inter-frame redundant information of the dynamic frames to perform lossless compression on the point cloud video, effectively reduces the transmission bandwidth consumption of the point cloud video stream, and simultaneously utilizes the inter-frame implicit correlation to compress the point cloud video, thereby effectively reducing the video data volume.

Description

Point cloud video coding method based on inter-frame implicit correlation

Technical Field

The invention relates to the technical field of point cloud video compression, in particular to a point cloud video coding method based on inter-frame implicit correlation.

Background

The mainstream point cloud video codec projects the point cloud video to the 2D video and then encodes, or directly encodes the point cloud .V-PCC(Sebastian Schwarz et al.2019.Emerging MPEG Standards for Point Cloud Compression.IEEE Journal on Emerging and Selected Topics in Circuits and Systems 9,1(2019),133–148.) encoder to project the geometry and properties of the point cloud video to the plurality of 2D video tracks .Vues(Yu Liu et al.2022.Vues:Practical Mobile Volumetric Video Streaming through Multiview Transcoding.In Proceedings of the 28th Annual International Conference on Mobile Computing And Networking(Sydney,NSW,Australia)(MobiCom'22).Association for Computing Machinery,New York,NY,USA,514–527.) and transcodes the point cloud video to a 2D video using an edge server. Recently, there has also been work to directly encode point cloud video. Draco derived from google utilizes kd-Tree to encode point cloud .GROOT(Kyungjin Lee et al.2020.GROOT:A Real-Time Streaming System of High-Fidelity Volumetric Videos.In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking(London,United Kingdom)(MobiCom'20).Association for Computing Machinery,New York,NY,USA,Article 57,14pages.), and proposes that parallel octree improves decoding efficiency .YuZu(Anlan Zhang et al.2022.YuZu:Neural-Enhanced Volumetric Video Streaming.In 19th USENIX Symposium on Networked Systems Design and Implementation(NSDI 22).USENIX Association,Renton,WA,137–154.) utilizes 3D super-resolution technology to improve point cloud density .AITransfer(Yakun Huang et al.2021.AITransfer:Progressive AI-Powered Transmission for Real-Time Point Cloud Video Streaming.In Proceedings of the 29th ACM International Conference on Multimedia(Virtual Event,China)(MM'21).Association for Computing Machinery,New York,NY,USA,3989–3997.) and uses an AI model to encode point cloud. In general, these systems encode each frame of point cloud independently without considering inter-frame redundancy information.

In recent years, there have also been some advances .Kammerl(Julius Kammerl et al.2012.Real-time compression of point cloud streams.In 2012 IEEE International Conference on Robotics and Automation.778–785.) in the direction of inter-frame coding of point cloud video, et al, which propose to represent a point cloud frame as a distinction from the previous frame. However, this approach is only effective on point cloud frames that are full of static content. Many studies later compress inter-frame redundancy information using an entropy encoding-based compression method. Most of these methods rely on explicit correlation between frames, i.e. duplicate information of the same or adjacent position between adjacent frames. However, they are not specifically designed for point cloud video streaming and ignore video dynamics.

Disclosure of Invention

The invention aims to provide a point cloud video coding method based on inter-frame implicit correlation.

The technical scheme adopted by the invention for solving the technical problems is as follows:

The invention discloses a point cloud video coding method based on inter-frame implicit correlation, which comprises the following steps:

step S1, motion compensation with minimized entropy;

Step S1.1, voxelizing the point cloud, namely mapping the point cloud into a three-dimensional grid;

Step S1.2, generating a reference frame by adopting a motion compensation method, wherein the reference frame aligns a topological structure in implicit correlation between frames and simultaneously minimizes conditional entropy;

S1.2.1, dividing the frame into small cubes;

Step S1.2.2, evaluating the matching degree between the two cubes by using an index;

Step S1.2.3, searching a cube with the best matching degree from cubes of the previous frame for each cube of the current frame;

Step S1.2.4, splicing each best matching cube to generate a reference frame;

Step S1.2.5, selecting a reference frame capable of minimizing conditional entropy as motion compensation output with minimized entropy:

step S2, inter-frame entropy coding;

The current frame is encoded using the inter-frame entropy encoding algorithm S4D and the reference frame generated in step S1 capable of minimizing conditional entropy as a context.

Further, the specific operation flow of step S1.1 is as follows:

The voxelized point cloud frame is represented by a three-dimensional array, each member, namely the voxel, of the three-dimensional array is empty or occupied, and the definition of the conditional entropy H is as follows:

Wherein P ₀ represents the probability that the voxel in the current frame is empty, P ₁ represents the probability that the voxel in the current frame is occupied; And Representing the conditional probability that the voxel in the current frame is empty and occupied, respectively, when the voxel in the previous frame is empty; And Representing the conditional probability that the voxel in the current frame is empty and occupied when the voxel in the previous frame is occupied, respectively.

Further, the specific operation flow of step s1.2.1 is as follows:

the current frame and the previous frame are respectively denoted by I _t and I _t-1, and the current frame I _t is divided into cubes which are mutually disjoint Side length is M voxels, each cubeLocated at position (x _j,y_j,z_j).

Further, the specific operation flow of step S1.2.2 is as follows:

to give each cube Finding the best matching cube in the previous frame I _t-1 At the same timeAn exhaustive search is performed in a search window of center and side length W voxels, whereinIs the union cube in the previous frame I _t-1 The search space being represented as a set of candidate cubesMotion vectors are defined from the best matching cubeDirectional cubeWhen a candidate cubeAnd cubeWhen the matching degree is high, candidate cubes are usedVoxel prediction cube in (a)The corresponding voxels in (a) are used for representing the prediction result by using a confusion matrix, and the prediction result is shown in the following table;

Wherein, Represented in cubesIs occupied, i.e. has a value of 1,Represented in cubesIs empty, i.e. has a value of 0,Expressed in candidate cubesIs occupied, i.e. has a value of 1,Expressed in candidate cubesWherein the voxel is empty, i.e. has a value of 0, true Positive (TP) represents a True case, false Positive (FP) represents a False Positive, FALSE NEGATIVE (FN) represents a False Negative, true Negative (TN) represents a True Negative;

The candidate cube And cubeThe matching degree of (a) is represented by precision or recall, and precision refers to the proportion of true values in voxels predicted to be positive, i.e.Recall that the positive proportion of voxels with positive truth value is predicted to be positiveN (TP) represents the number of occurrences of TP sample, n (FP) represents the number of occurrences of FP sample, n (FN) represents the number of occurrences of FN sample, and the two indexes are balanced by using F-score to evaluate the matching degree F _β of the two cubes:

Where β represents the balance coefficient, β balances the accuracy and importance of recall, precision represents the accuracy value, recall represents the recall value.

Further, the specific operation flow of step S1.2.3 is as follows:

At the collection of candidate cubes In, with cubesBest matching cube yielding highest F-scoreAnd k candidate matching indexes F ₁,…,F_k are used for searching k best matching cubes for each cube of the current frame respectively.

Further, the specific operation flow of step S1.2.4 is as follows:

for a certain candidate matching index F _k, all best matching cubes are obtained Thereafter, a best matching cube is usedReplacement cubeTo generate one reference frame, and k reference frames will be generated since k candidate matching indexes are used.

Further, the specific operation flow of step S1.2.5 is as follows:

A corresponding plurality of reference frames is generated using a series of beta values, 0< beta ₁＜…＜β_k < + > infinity, and calculating the conditional entropy corresponding to each reference frame And selecting a beta _k value corresponding to the minimum conditional entropy, wherein a reference frame corresponding to the beta _k value is the motion compensation output with minimum entropy.

The beneficial effects of the invention are as follows:

The point cloud video coding method based on the inter-frame implicit correlation fully utilizes the inter-frame redundant information of the dynamic frames to perform lossless compression on the point cloud video, effectively reduces the consumption of the streaming bandwidth of the point cloud video, and simultaneously utilizes the inter-frame implicit correlation to compress the point cloud video. Compared with the prior art, the method has obvious advantages in the point cloud video compression performance.

Detailed Description

The invention discovers that the inter-frame explicit correlation is obviously reduced on frames with stronger dynamics. Therefore, the invention finds an inter-frame implicit correlation, namely the consistency of the topological structure between adjacent frames. The present invention finds that the inter-frame implicit correlation is maintained at a relatively high level even on frames with strong dynamics. Thus, inter-frame implicit correlation has a high potential to help compress point cloud video. In order to fully utilize the implicit correlation between frames to compress the point cloud video, the invention adopts widely used entropy coding as a basic encoder model, and can use a reference frame as auxiliary information, and in the prior art, the last frame is directly used as the reference frame, so that the coding effect is poor. Meanwhile, smaller inter-frame conditional entropy can provide a higher theoretical upper limit of compression rate, and simply aligning adjacent frames using the existing motion estimation method cannot effectively reduce the inter-frame conditional entropy.

Therefore, the invention provides a point cloud video coding method based on inter-frame implicit correlation, which specifically comprises the following steps:

step S1, motion compensation with minimized entropy;

The goal of this step is to generate a reference frame that effectively increases the compression rate, and the effectiveness of the compression can be measured by the conditional entropy defined on the reference frame and the current frame. The specific operation steps are as follows:

s1.1, carrying out point cloud voxelization;

The point cloud is first voxelized, i.e., mapped into a three-dimensional grid. The voxelized point cloud frame is represented using a three-dimensional array, each member (voxel) of which is either empty (represented by 0) or occupied (represented by 1). The conditional entropy H is defined as follows:

Step S1.2, generating a reference frame by adopting a motion compensation method, wherein the reference frame aligns a topological structure in the implicit correlation between frames and simultaneously minimizes a conditional entropy H, and the method specifically comprises the following 5 steps:

S1.2.1, dividing the frame into small cubes;

the current frame and the previous frame are respectively denoted by I _t and I _t-1, and the current frame I _t is divided into cubes which are mutually disjoint Side length is M voxels, each cubeAt position (x _j,y_j,z_j), dividing the previous frame into cubes which are mutually disjoint in the same mannerCube bodyLocated at position (x _j,y_j,z_j).

to give each cube Finding the best matching cube in the previous frame I _t-1 At the same timeAn exhaustive search is performed in a search window of center and side length W voxels, whereinIs the union cube in the previous frame I _t-1 The search space being represented as a set of candidate cubesMotion vectors are defined from the best matching cubeDirectional cube

When a candidate cubeAnd cubeWhen the matching degree is high, candidate cubes can be usedVoxel prediction cube in (a)Corresponding voxels of (b) a corresponding voxel of (c). Considering this process as binary prediction, since each voxel may be either 1 (Positive) or 0 (Negative), a confusion matrix is used to represent the prediction result, as shown in table 1.

TABLE 1

Wherein, Represented in cubesIs occupied, i.e. has a value of 1,Represented in cubesIs empty, i.e. has a value of 0,Expressed in candidate cubesIs occupied, i.e. has a value of 1,Expressed in candidate cubesIn (3) is empty, i.e. the value is 0, true Positive (TP) represents a True case, false Positive (FP) represents a False Positive, FALSE NEGATIVE (FN) represents a False Negative, and True Negative (TN) represents a True Negative.

N (·) is used to represent the number of occurrences of a particular element in the confusion matrix. Two cubes (candidate cubes)And cube) The degree of matching of (c) may be expressed in terms of precision or recall. Specifically, precision refers to the proportion of positive values in voxels predicted to be positive, i.eWhereas recall indicates that the positive proportion of voxels with positive true values is predicted, i.eWhere, n (FP) represents the number of occurrences of the FP sample, and n (FN) represents the number of occurrences of the FN sample. Because the two indexes are not uniform, the matching degree F _β of the two cubes is better evaluated by using the F-fraction to balance the two indexes:

At the collection of candidate cubes In, with cubesBest matching cube yielding highest F-scoreIs considered as the best match result. However, it is difficult to consider minimizing conditional entropy when searching for the best matching cube, as conditional entropy depends on the entire reference frame. To this end, the present invention uses k candidate matching indexes F ₁,…,F_k to search k best matching cubes for each cube of the current frame, respectively.

Step S1.2.4, splicing each best matching cube to generate a reference frame;

Since step S1.2.3 uses k candidate matching indices, k reference frames will be generated. For a certain candidate matching index F _k, all best matching cubes are obtained Thereafter, a best matching cube is usedReplacement cubeTo generate a reference frame.

Step S1.2.5, selecting a reference frame capable of minimizing conditional entropy;

The conditional entropy depends on the index of the reference frame and its corresponding degree of matching F _β. However, in generating the reference frame, it is not determined which β value corresponds to the reference frame that minimizes conditional entropy. Therefore, the present invention simultaneously generates a plurality of corresponding reference frames using a plurality of beta values, and calculates a conditional entropy corresponding to each reference frame. The conditional entropy of the reference frame corresponding to the index using the degree of matching F _β is denoted by H _β. A series of beta values are used which, 0< beta ₁＜…＜β_k < + > infinity, then calculate the corresponding conditional entropy Corresponding conditional entropyThe smallest β _k value is selected, then the corresponding reference frame is the entropy minimized motion compensated output.

Step S2, inter-frame entropy coding;

The current frame is encoded using the inter-frame entropy encoding algorithm (S4D) and the reference frame generated in step S1 capable of minimizing conditional entropy as a context. The specific operation steps are as follows:

S4D encodes each element of the point cloud three-dimensional array using context-adaptive binary arithmetic coding (CABAC). Specifically, for any element value (0 or 1) in the three-dimensional array of the current frame, S4D uses the element value of the same position in the three-dimensional array of the reference frame as the context of CABAC encoding.

In the prior art, the previous frame is directly used as a reference frame, only the explicit correlation (the same position between adjacent frames or the repeated information of the adjacent positions) between frames is utilized, the compression efficiency is poor on the frames with higher dynamic frames, and the compression performance is poor. The invention specially generates the reference frame corresponding to the minimized conditional entropy through the step S1, is very beneficial to the inter-frame entropy coding, effectively utilizes the inter-frame implicit correlation, and can effectively improve the compression rate on the dynamic frame. The prior art uses a plurality of voxels of the current frame and the reference frame as probability conditions in entropy coding, resulting in a high computational complexity, resulting in a decoded frame rate of less than 1FPS. The invention only uses the left adjacent voxel of the coded voxel and the voxel at the same position in the reference frame as probability conditions in entropy coding, thereby reducing the computational complexity. Thus, the present invention is more applicable to mobile streaming systems.

In order to verify the effect of the point cloud video coding method based on the inter-frame implicit correlation, the three public datasets Ricardo, pizza and Longdress are used for experimental comparison between the prior art and the method. The results show that the present invention reduces bandwidth consumption by 23.15%, 1.06% and 43.32% over three data sets, respectively, compared to existing inter-frame encoders that utilize explicit inter-frame correlation. Compared with the prior art, the method has obvious advantages in the point cloud video compression performance.

The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.

Claims

1. The point cloud video coding method based on the inter-frame implicit correlation is characterized by comprising the following steps of:

step S1, motion compensation with minimized entropy;

S1.2.1, dividing the frame into small cubes;

Step S1.2.4, splicing each best matching cube to generate a reference frame;

step S1.2.5, selecting a reference frame capable of minimizing conditional entropy as motion compensation output with minimized entropy;

step S2, inter-frame entropy coding;

2. The method for point cloud video coding based on inter-frame implicit correlation according to claim 1, wherein the specific operation flow of step S1.1 is as follows:

3. The method for point cloud video coding based on inter-frame implicit correlation according to claim 1, wherein the specific operation flow of step s1.2.1 is as follows:

4. The method for point cloud video encoding based on inter-frame implicit correlation of claim 3, wherein the specific operation procedure of step S1.2.2 is as follows:

wherein, positive in Represented in cubesIs occupied, i.e. has a value of 1, negative inRepresented in cubesIn which the voxel is empty, i.e. has a value of 0, positive inExpressed in candidate cubesIs occupied, i.e. has a value of 1, negative inExpressed in candidate cubesWherein the voxel is empty, i.e. has a value of 0, true Positive (TP) represents a True case, false Positive (FP) represents a False Positive, FALSE NEGATIVE (FN) represents a False Negative, true Negative (TN) represents a True Negative;

5. The method for point cloud video encoding based on inter-frame implicit correlation of claim 4, wherein the specific operation procedure of step S1.2.3 is as follows:

6. The method for point cloud video encoding based on inter-frame implicit correlation of claim 5, wherein the specific operation flow of step S1.2.4 is as follows:

7. The method for point cloud video encoding based on inter-frame implicit correlation of claim 6, wherein the specific operation procedure of step S1.2.5 is as follows:

A corresponding plurality of reference frames is generated using a series of beta values, 0< beta ₁<…<β_k < + > infinity, and calculating the conditional entropy corresponding to each reference frame And selecting a beta _k value corresponding to the minimum conditional entropy, wherein a reference frame corresponding to the beta _k value is the motion compensation output with minimum entropy.