CN120303940A - Point cloud encoding and decoding method, device, equipment and storage medium - Google Patents
Point cloud encoding and decoding method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN120303940A CN120303940A CN202380078227.5A CN202380078227A CN120303940A CN 120303940 A CN120303940 A CN 120303940A CN 202380078227 A CN202380078227 A CN 202380078227A CN 120303940 A CN120303940 A CN 120303940A
- Authority
- CN
- China
- Prior art keywords
- node
- prediction
- nodes
- current
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/597—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
The application provides a point cloud encoding and decoding method, a device, equipment and a storage medium, wherein the method comprises the following steps: and determining N prediction nodes of the current node in the prediction reference frame of the current frame to be coded, and performing prediction coding and decoding on the coordinate information of the points of the current node based on the geometric coding and decoding information of the points of the N prediction nodes. That is, the embodiment of the application optimizes the nodes when DCM direct encoding and decoding is performed, predicts and decodes the geometric information of the midpoint of the IDCM node (i.e. the current node) of the to-be-node by utilizing the geometric information of the predicted node in the predicted reference frame by considering the correlation in the time domain between adjacent frames, and further improves the geometric information encoding and decoding efficiency of the point cloud by considering the correlation in the time domain between adjacent frames.
Description
The present application relates to the field of point cloud technologies, and in particular, to a method, an apparatus, a device, and a storage medium for encoding and decoding a point cloud.
And acquiring the object surface through acquisition equipment to form point cloud data, wherein the point cloud data comprises hundreds of thousands or more points. In the video production process, the point cloud data is transmitted between the point cloud encoding device and the point cloud decoding device in the form of point cloud media files. However, such huge points present a challenge for transmission, and therefore, the point cloud encoding device needs to compress the point cloud data and transmit the compressed point cloud data.
Compression of the point cloud is also called encoding of the point cloud, and in the process of encoding the point cloud, for points in isolated positions in a geometric space, complexity can be greatly reduced by using an inferred direct encoding Mode (INFER DIRECT Mode encoding, abbreviated as IDCM). And when the current node is coded and decoded by adopting a direct coding mode, the geometric information of the midpoint of the current node is directly coded. However, when the geometric information of the point in the current node is coded at present, the inter-frame information is not considered, so that the coding and decoding performance of the point cloud is reduced.
Disclosure of Invention
The embodiment of the application provides a point cloud coding and decoding method, device, equipment and storage medium, which take inter-frame information into consideration when the geometrical information of points in nodes is coded, so that the coding and decoding performance of point clouds is improved.
In a first aspect, an embodiment of the present application provides a point cloud decoding method, including:
In a prediction reference frame of a current frame to be decoded, determining N prediction nodes of a current node, wherein the current node is a node to be decoded in the current frame to be decoded, and N is a positive integer;
and carrying out predictive decoding on the coordinate information of the points in the current node based on the geometric decoding information of the N predictive nodes.
In a second aspect, the present application provides a point cloud encoding method, including:
in a prediction reference frame of a current frame to be encoded, determining N prediction nodes of a current node, wherein the current node is a node to be encoded in the current frame to be encoded, and N is a positive integer;
And carrying out predictive coding on the coordinate information of the points in the current node based on the geometric coding information of the N predictive nodes.
In a third aspect, the present application provides a point cloud decoding apparatus for performing the method in the first aspect or each implementation manner thereof. In particular, the apparatus comprises a functional unit for performing the method of the first aspect described above or in various implementations thereof.
In a fourth aspect, the present application provides a point cloud encoding apparatus for performing the method in the second aspect or each implementation manner thereof. In particular, the apparatus comprises functional units for performing the method of the second aspect described above or in various implementations thereof.
In a fifth aspect, a point cloud decoder is provided that includes a processor and a memory. The memory is for storing a computer program and the processor is for calling and running the computer program stored in the memory for performing the method of the first aspect or implementations thereof.
In a sixth aspect, a point cloud encoder is provided that includes a processor and a memory. The memory is for storing a computer program and the processor is for invoking and running the computer program stored in the memory to perform the method of the second aspect or implementations thereof described above.
In a seventh aspect, a point cloud codec system is provided, including a point cloud encoder and a point cloud decoder. The point cloud decoder is configured to perform the method of the first aspect or its respective implementation forms, and the point cloud encoder is configured to perform the method of the second aspect or its respective implementation forms.
An eighth aspect provides a chip for implementing the method of any one of the first to second aspects or each implementation thereof. In particular, the chip comprises a processor for calling and running a computer program from a memory, such that a device on which the chip is installed performs the method as in any one of the above-mentioned first to second aspects or implementations thereof.
In a ninth aspect, a computer-readable storage medium is provided for storing a computer program for causing a computer to perform the method of any one of the above first to second aspects or implementations thereof.
In a tenth aspect, there is provided a computer program product comprising computer program instructions for causing a computer to perform the method of any one of the first to second aspects or implementations thereof.
In an eleventh aspect, there is provided a computer program which, when run on a computer, causes the computer to perform the method of any one of the above-described first to second aspects or implementations thereof.
In a twelfth aspect, there is provided a code stream generated based on the method of the second aspect, optionally, the code stream including at least one of a first parameter and a second parameter.
Based on the technical scheme, when the current node in the current coding and decoding frame is decoded, N prediction nodes of the current node are determined in the prediction reference frame of the current frame to be coded and decoded, and based on geometric coding and decoding information of points in the N prediction nodes, the coordinate information of the points in the current node is predicted and coded. That is, the embodiment of the application optimizes the nodes when DCM direct encoding and decoding is performed, predicts and decodes the geometric information of the midpoint of the IDCM node (i.e. the current node) of the to-be-node by utilizing the geometric information of the predicted node in the predicted reference frame by considering the correlation in the time domain between adjacent frames, and further improves the geometric information encoding and decoding efficiency of the point cloud by considering the correlation in the time domain between adjacent frames.
FIG. 1A is a schematic view of a point cloud;
FIG. 1B is a partial enlarged view of a point cloud;
FIG. 2 is a schematic view of six viewing angles of a point cloud image;
FIG. 3 is a schematic block diagram of a point cloud codec system according to an embodiment of the present application;
FIG. 4A is a schematic block diagram of a point cloud encoder provided by an embodiment of the present application;
FIG. 4B is a schematic block diagram of a point cloud decoder provided by an embodiment of the present application;
FIG. 5A is a schematic plan view;
FIG. 5B is a schematic diagram of a node encoding sequence;
FIG. 5C is a schematic plan view of a planar logo;
FIG. 5D is a schematic illustration of sibling nodes;
FIG. 5E is a schematic view of the intersection of a lidar with a node;
FIG. 5F is a schematic diagram of neighboring nodes at the same depth and the same coordinates of the same partition;
FIG. 5G is a schematic diagram of a neighborhood node when the node is in a parent node low plane position;
FIG. 5H is a schematic diagram of a neighborhood node when the node is in a parent node high plane position;
FIG. 5I is a schematic diagram of predictive coding of laser radar point cloud plane location information;
FIG. 6A is a schematic diagram of IDCM encoding;
FIG. 6B is a schematic diagram of coordinate transformation of a point cloud acquired by a rotating lidar;
FIG. 6C is a schematic diagram of predictive coding in the X or Y axis direction;
FIG. 6D is a schematic view of the prediction of the angle of the X or Y plane by horizontal azimuth;
FIG. 6E is a schematic diagram of predictive coding on the X or Y axis;
FIGS. 7A-7C are diagrams of triangular patch based geometric information encoding;
FIG. 8A is a schematic diagram of an encoding framework of an AVS;
FIG. 8B is a schematic diagram of a decoding framework of an AVS;
FIG. 9A is a schematic diagram of reference nodes selected by each child node;
FIG. 9B is a schematic diagram of 4 sets of reference neighbor nodes for the current node;
FIG. 9C is a schematic diagram of sub-blocks corresponding to 6 adjacent parent blocks, respectively;
FIG. 9D is a diagram showing the Morton sequence numbering of surrounding 18 neighboring blocks utilized by a current block to be encoded;
FIG. 9E is a simplified prediction tree diagram;
Fig. 10 is a schematic flow chart of a point cloud decoding method according to an embodiment of the present application;
FIG. 11 is a schematic diagram of octree partitioning;
FIG. 12 is a schematic diagram of a predictive node;
FIG. 13 is a schematic diagram of a domain node;
FIG. 14 is a schematic diagram of corresponding nodes of a domain node;
FIG. 15A is a schematic diagram of a predicted node of a current node in a predicted reference frame;
FIG. 15B is a schematic diagram of a current node in two predicted reference frames;
FIG. 16A is a schematic diagram of IDCM encoding;
FIG. 16B is a schematic diagram of IDCM decoding;
FIG. 17 is a schematic flow chart of a point cloud encoding method according to an embodiment of the present application;
fig. 18 is a schematic block diagram of a point cloud decoding apparatus provided by an embodiment of the present application;
FIG. 19 is a schematic block diagram of a point cloud encoding apparatus provided by an embodiment of the present application;
FIG. 20 is a schematic block diagram of an electronic device provided by an embodiment of the present application;
Fig. 21 is a schematic block diagram of a point cloud codec system provided by an embodiment of the present application.
The application can be applied to the technical field of point cloud up-sampling, for example, the technical field of point cloud compression.
In order to facilitate understanding of the embodiments of the present application, the following brief description will be first given of related concepts related to the embodiments of the present application:
Point Cloud (Point Cloud) refers to a set of irregularly distributed discrete points in space that represent the spatial structure and surface properties of a three-dimensional object or scene. Fig. 1A is a schematic view of a three-dimensional point cloud image, fig. 1B is a partial enlarged view of fig. 1A, and as can be seen from fig. 1A and 1B, a point cloud surface is composed of densely distributed points.
The two-dimensional image has information expression and regular distribution in each pixel point, so that the position information of the two-dimensional image does not need to be additionally recorded, however, the distribution of the points in the point cloud in the three-dimensional space has randomness and irregularity, so that the position of each point in the space needs to be recorded, and the point cloud can be completely expressed. Similar to a two-dimensional image, each position in the acquisition process has corresponding attribute information.
The Point Cloud Data (Point Cloud Data) is a specific recorded form of a Point Cloud, and points in the Point Cloud may include position information of the points and attribute information of the points. For example, the position information of the point may be three-dimensional coordinate information of the point. The positional information of the points may also be referred to as geometric information of the points. For example, the attribute information of the dots may include color information, reflectivity information, normal vector information, and the like. The color information reflects the color of the object, and the reflectance (reflectance) information reflects the surface texture of the object. The color information may be information in any color space. For example, the color information may be (RGB). For another example, the color information may be luminance and chrominance (YcbCr, YUV) information. For example, Y denotes brightness (Luma), cb (U) denotes blue color difference, cr (V) denotes red, and U and V are expressed as chromaticity (Chroma) for describing color difference information. For example, a point cloud obtained according to the laser measurement principle, in which points may include three-dimensional coordinate information of the points and laser reflection intensity (reflection) of the points. For another example, a point cloud obtained according to a photogrammetry principle, where the points in the point cloud may include three-dimensional coordinate information of the points and color information of the points. For another example, a point cloud is obtained by combining laser measurement and photogrammetry principles, and the points in the point cloud may include three-dimensional coordinate information of the points, laser reflection intensity (reflection) of the points, and color information of the points. Fig. 2 shows a point cloud image, wherein fig. 2 shows six viewing angles of the point cloud image, and table 1 shows a point cloud data storage format composed of a file header information part and a data part:
TABLE 1
In table 1, the header information includes a data format, a data expression type, a total point number of point clouds, and a content expressed by the point clouds, for example, the point clouds in this example are in ". Ply" format, expressed by ASCII codes, the total point number is 207242, and each point has three-dimensional position information XYZ and three-dimensional color information RGB.
The point cloud can flexibly and conveniently express the space structure and the surface attribute of a three-dimensional object or scene, and can provide extremely strong sense of reality on the premise of ensuring the accuracy because the point cloud is obtained by directly sampling the real object, so that the application range is wide, and the range comprises virtual reality games, computer aided designs, geographic information systems, automatic navigation systems, digital cultural heritage, free viewpoint broadcasting, three-dimensional immersion remote presentation, three-dimensional reconstruction of biological tissue and organs and the like.
The acquisition path of the point cloud data may include, but is not limited to, at least one of (1) computer device generation. The computer device may generate point cloud data from the virtual three-dimensional object and the virtual three-dimensional scene. (2) 3D (3-Dimension) laser scanning acquisition. The method can acquire the point cloud data of a static real world three-dimensional object or a three-dimensional scene through 3D laser scanning, can acquire millions of point cloud data per second, and can acquire 3D photogrammetry. The real-world visual scene is acquired by a 3D photographing device (i.e. a group of cameras or a camera device with a plurality of lenses and sensors) to obtain point cloud data of the real-world visual scene, and the dynamic real-world three-dimensional object or the point cloud data of the three-dimensional scene can be obtained by 3D photographing. (4) And acquiring point cloud data of the biological tissue organ through the medical equipment. In the medical field, point cloud data of biological tissue and organs can be acquired through medical equipment such as magnetic resonance imaging (Magnetic Resonance Imaging, MRI), electronic computer tomography (Computed Tomography, CT), electromagnetic positioning information and the like.
The point cloud can be divided into an intensive point cloud and a sparse point cloud according to the acquired approach.
The point cloud is divided into the following types according to the time sequence of the data:
the first type of static point cloud is that an object is static, and equipment for acquiring the point cloud is also static;
A second type of dynamic point cloud, wherein the object is moving, but the equipment for acquiring the point cloud is stationary;
a third type of dynamic acquisition point cloud-the device that acquires the point cloud is in motion.
The applications of the point cloud are divided into two main types:
the machine perception point cloud can be used for scenes such as an autonomous navigation system, a real-time inspection system, a geographic information system, a visual sorting robot, an emergency rescue robot and the like;
And secondly, human eyes perceive point clouds, which can be used for point cloud application scenes such as digital cultural heritage, free viewpoint broadcasting, three-dimensional immersion communication, three-dimensional immersion interaction and the like.
The point cloud acquisition technology reduces the point cloud data acquisition cost and time period and improves the data precision. The transformation of the point cloud data acquisition mode enables the acquisition of a large amount of point cloud data, and the processing of the massive 3D point cloud data encounters the bottleneck of storage space and transmission bandwidth limitation along with the increase of application requirements.
Taking a point cloud video with a frame rate of 30fps (frames per second) as an example, the number of points of each frame of point cloud is 70 ten thousand, each point has coordinate information xyz (float) and color information RGB (uchar), the data size of the 10s point cloud video is about 0.7millionX (4bytex3+1bytex3) X30fpsX s=3.15 GB, while the data size of the 10s two-view 3D video is about 0.33X2 =0.66 GB, and the data size of the 10s two-dimensional video is about 1280X720X12bitX framesX s ≡0.33GB, which is a 1280X720 two-dimensional video with a frame rate of 24 fps. It can be seen that the data volume of the point cloud video far exceeds the data volumes of the two-dimensional video and the three-dimensional video of the same duration. Therefore, in order to better realize data management, save the storage space of the server, reduce the transmission flow and transmission time between the server and the client, and the point cloud compression becomes a key problem for promoting the development of the point cloud industry.
The following describes the relevant knowledge of the point cloud codec.
Fig. 3 is a schematic block diagram of a point cloud codec system according to an embodiment of the present application. It should be noted that fig. 3 is only an example, and the point cloud codec system according to the embodiment of the present application includes but is not limited to the one shown in fig. 3. As shown in fig. 3, the point cloud codec system 100 includes an encoding device 110 and a decoding device 120. Wherein the encoding device is configured to encode (which may be understood as compressing) the point cloud data to generate a code stream, and to transmit the code stream to the decoding device. The decoding device decodes the code stream generated by the encoding device to obtain decoded point cloud data.
The encoding device 110 of the embodiment of the present application may be understood as a device having a point cloud encoding function, and the decoding device 120 may be understood as a device having a point cloud decoding function, that is, the embodiment of the present application includes a wider apparatus for the encoding device 110 and the decoding device 120, such as a smart phone, a desktop computer, a mobile computing apparatus, a notebook (e.g., laptop) computer, a tablet computer, a set-top box, a television, a camera, a display apparatus, a digital media player, a point cloud game console, a vehicle-mounted computer, and the like.
In some embodiments, the encoding device 110 may transmit the encoded point cloud data (e.g., a code stream) to the decoding device 120 via the channel 130. The channel 130 may include one or more media and/or devices capable of transmitting encoded point cloud data from the encoding device 110 to the decoding device 120.
In one example, channel 130 includes one or more communication media that enable encoding device 110 to transmit encoded point cloud data directly to decoding device 120 in real-time. In this example, the encoding apparatus 110 may modulate the encoded point cloud data according to a communication standard and transmit the modulated point cloud data to the decoding apparatus 120. Where the communication medium comprises a wireless communication medium, such as a radio frequency spectrum, the communication medium may optionally also comprise a wired communication medium, such as one or more physical transmission lines.
In another example, channel 130 includes a storage medium that may store point cloud data encoded by encoding device 110. Storage media include a variety of locally accessed data storage media such as compact discs, DVDs, flash memory, and the like. In this example, the decoding device 120 may obtain encoded point cloud data from the storage medium.
In another example, the channel 130 may comprise a storage server that may store the point cloud data encoded by the encoding device 110. In this example, the decoding device 120 may download stored encoded point cloud data from the storage server. Alternatively, the storage server may store the encoded point cloud data and may transmit the encoded point cloud data to the decoding device 120, such as a web server (e.g., for a website), a File Transfer Protocol (FTP) server, or the like.
In some embodiments, the encoding apparatus 110 includes a point cloud encoder 112 and an output interface 113. Wherein the output interface 113 may comprise a modulator/demodulator (modem) and/or a transmitter.
In some embodiments, the encoding device 110 may include a point cloud source 111 in addition to the point cloud encoder 112 and the input interface 113.
The point cloud source 111 may include at least one of a point cloud acquisition device (e.g., scanner), a point cloud archive, a point cloud input interface for receiving point cloud data from a point cloud content provider, a computer graphics system for generating point cloud data.
The point cloud encoder 112 encodes point cloud data from the point cloud source 111 to generate a code stream. The point cloud encoder 112 directly transmits the encoded point cloud data to the decoding device 120 via the output interface 113. The encoded point cloud data may also be stored on a storage medium or storage server for subsequent reading by the decoding device 120.
In some embodiments, decoding device 120 includes an input interface 121 and a point cloud decoder 122.
In some embodiments, the decoding apparatus 120 may further include a display device 123 in addition to the input interface 121 and the point cloud decoder 122.
Wherein the input interface 121 comprises a receiver and/or a modem. The input interface 121 may receive the encoded point cloud data through the channel 130.
The point cloud decoder 122 is configured to decode the encoded point cloud data to obtain decoded point cloud data, and transmit the decoded point cloud data to the display device 123.
The display device 123 displays the decoded point cloud data. The display device 123 may be integral with the decoding apparatus 120 or external to the decoding apparatus 120. The display device 123 may include a variety of display devices, such as a Liquid Crystal Display (LCD), a plasma display, an Organic Light Emitting Diode (OLED) display, or other types of display devices.
In addition, fig. 3 is only an example, and the technical solution of the embodiment of the present application is not limited to fig. 3, for example, the technology of the present application may also be applied to single-sided point cloud encoding or single-sided point cloud decoding.
The current point cloud encoder can adopt the international standard organization moving picture experts group (Moving Picture Experts Group, MPEG) to propose two point cloud compression coding technical routes, namely projection-based point cloud compression (Video-based Point Cloud Compression, VPCC) and geometric-based point cloud compression (Geometry-based Point Cloud Compression, GPCC). The VPCC utilizes the existing two-dimensional coding tool to code the projected two-dimensional image by projecting the three-dimensional point cloud to two dimensions, GPCC utilizes the hierarchical structure to divide the point cloud into a plurality of units step by step, and codes the whole point cloud through the coding record dividing process.
A point cloud encoder and a point cloud decoder to which the embodiments of the present application are applicable are described below by taking GPCC codec frames as an example.
Fig. 4A is a schematic block diagram of a point cloud encoder provided by an embodiment of the present application.
The points in the above-described known point cloud may include position information of the points and attribute information of the points, and thus, the encoding of the points in the point cloud mainly includes position encoding and attribute encoding. In some examples, the location information of points in the point cloud is also referred to as geometric information, and the corresponding location encoding of points in the point cloud may also be referred to as geometric encoding.
In GPCC coding framework, the geometric information and corresponding attribute information of the point cloud are separately coded.
As shown in fig. 4A below, the current geometry codec of G-PCC can be divided into octree-based geometry codec and predictive tree-based geometry codec.
The process of position coding includes preprocessing points in the point cloud, such as coordinate transformation, quantization, removal of duplicate points, etc., and then geometrically coding the preprocessed point cloud, such as constructing an octree, or constructing a predictive tree, and geometrically coding based on the constructed octree or predictive tree to form a geometric code stream. And simultaneously, reconstructing the position information of each point in the point cloud data based on the position information output by the constructed octree or the prediction tree to obtain a reconstruction value of the position information of each point.
The attribute coding process comprises the steps of selecting one of three prediction modes to conduct point cloud prediction through given reconstruction information of position information of input point cloud and original values of the attribute information, quantifying predicted results, and conducting arithmetic coding to form an attribute code stream.
As shown in fig. 4A, the position encoding may be achieved by:
a coordinate transformation (Tanmsform coordinates) unit 201, a voxel (Voxelize) unit 202, an octree partitioning (analysis octree) unit 203, a geometric reconstruction (Reconstruct geometry) unit 204, an arithmetic coding (ARITHMETIC ENCONDE) unit 205, a surface fitting unit (Analyze surface approximation) 206, and a prediction tree construction unit 207.
The coordinate conversion unit 201 may be used to convert world coordinates of points in the point cloud into relative coordinates. For example, the geometrical coordinates of the points are respectively subtracted by the minimum value of xyz coordinate axes, which is equivalent to a dc-cut operation, to achieve the transformation of the coordinates of the points in the point cloud from world coordinates to relative coordinates.
The voxel (Voxelize) unit 202, also referred to as a quantization and removal duplicate points (Quantize and remove points) unit, may reduce the number of coordinates by quantization, after quantization the original different points may be assigned the same coordinates, based on which duplicate points may be removed by a duplication operation, for example, multiple clouds with the same quantization location and different attribute information may be merged into one cloud by attribute conversion. In some embodiments of the present application, voxel unit 202 is an optional unit module.
The octree division unit 203 may encode the position information of the quantized points using an octree (octree) encoding scheme. For example, the point cloud is divided in the form of an octree, whereby the positions of points can be one-to-one corresponding to the positions of octree, and geometric encoding is performed by counting the positions of points in the octree and marking them with 1 as the flag.
In some embodiments, in the process of encoding geometric information based on the triangular patch sets (trianglesoup, trisoup), the point cloud is also subjected to octree division by the octree division unit 203, but unlike the process of encoding geometric information based on octree, the trisoup does not need to divide the point cloud step by step into unit cubes with the side length of 1X1, but does not need to divide the point cloud into blocks (sub-blocks) with the side length of W, the division is stopped when the blocks (sub-blocks) are divided into W, the surface formed by the distribution of the point cloud in each block is obtained, at most twelve vertexes (intersecting points) generated by the surface and twelve sides of the block are obtained, the intersecting points are subjected to surface fitting by the surface fitting unit 206, and the fitted intersecting points are subjected to geometric encoding.
The prediction tree construction unit 207 may encode the position information of the quantized points using a prediction tree encoding scheme. For example, the point cloud is divided according to the form of a prediction tree, so that the positions of the points can be in one-to-one correspondence with the positions of the nodes in the prediction tree, the geometric position information of the nodes is predicted by selecting different prediction modes to obtain prediction residues by counting the positions of the points in the prediction tree, and the geometric prediction residues are quantized by using quantization parameters. Finally, through continuous iteration, the prediction residual error, the prediction tree structure, the quantization parameter and the like of the position information of the prediction tree node are encoded, and a binary code stream is generated.
The geometric reconstruction unit 204 may perform position reconstruction based on the position information output by the octree dividing unit 203 or the intersection point fitted by the surface fitting unit 206, to obtain a reconstructed value of the position information of each point in the point cloud data. Or performing position reconstruction based on the position information output by the prediction tree construction unit 207, to obtain a reconstructed value of the position information of each point in the point cloud data.
The arithmetic coding unit 205 may perform arithmetic coding on the position information output by the octree analysis unit 203 or the intersection point after fitting by the surface fitting unit 206, or the geometric prediction residual value output by the prediction tree construction unit 207 by using an entropy coding manner to generate a geometric code stream, and the geometric code stream may also be referred to as a geometric bit stream (geometry bitstream).
Attribute encoding may be achieved by:
A color conversion (Transform colors) unit 210, a re-coloring (Transfer attributes) unit 211, a Region ADAPTIVE HIERARCHICAL Transform (RAHT) unit 212, a generate LOD (Generate LOD) unit 213 and a boost (lifting Transform) unit 214, a quantization coefficient (Quantize coefficients) unit 215, and an arithmetic coding unit 216.
It should be noted that the point cloud encoder 200 may include more, fewer, or different functional components than those of fig. 4A.
The color conversion unit 210 may be used to convert the RGB color space at the point cloud to YCbCr format or other formats.
The re-coloring unit 211 re-colors the color information using the reconstructed geometric information such that the uncoded attribute information corresponds to the reconstructed geometric information.
After the original value of the attribute information of the point is converted by the re-coloring unit 211, any one of the conversion units may be selected to convert the point in the point cloud. The transformation unit may include RAHT transformation 212 and lifting (lifting transform) unit 214. Among them, the promotion changes depend on the generation detail Layer (LOD).
Any one of RAHT transformation and lifting transformation can be understood as predicting attribute information of a point in a point cloud to obtain a predicted value of the attribute information of the point, and further obtaining a residual value of the attribute information of the point based on the predicted value of the attribute information of the point. For example, the residual value of the attribute information of the point may be the original value of the attribute information of the point minus the predicted value of the attribute information of the point.
In one embodiment of the application, the LOD generating unit generates LOD by acquiring Euclidean distance between points according to position information of points in point cloud and dividing the points into different detail expression layers according to Euclidean distance. In one embodiment, the Euclidean distances may be sorted and the different ranges of Euclidean distances divided into different layers of detail representation. For example, a point may be randomly selected as the first detail presentation layer. And then calculating the Euclidean distance between the rest points and the points, and classifying the points with the Euclidean distance meeting the first threshold value requirement as second detail expression layers. And acquiring the mass center of the points in the second detail expression layer, calculating the Euclidean distance between the points except the first detail expression layer and the second detail expression layer and the mass center, and classifying the points with the Euclidean distance conforming to the second threshold value as the third detail expression layer. And so on, all points are classified into the detail expression layer. By adjusting the threshold value of the euclidean distance, the number of points per LOD layer can be made incremental. It should be appreciated that the manner in which the LODs are partitioned may also take other forms, as the application is not limited in this regard.
It should be noted that, the point cloud may be directly divided into one or more detail expression layers, or the point cloud may be divided into a plurality of point cloud slices (slices) first, and then each point cloud slice may be divided into one or more LOD layers.
For example, the point cloud may be divided into a plurality of point cloud tiles, and the number of points per point cloud tile may be between 55 ten thousand and 110 ten thousand. Each point cloud tile can be viewed as a separate point cloud. Each point cloud tile may in turn be divided into a plurality of detail presentation layers, each detail presentation layer comprising a plurality of points. In one embodiment, the partitioning of the detail presentation layer may be done according to the Euclidean distance between points.
The quantization unit 215 may be used to quantize residual values of attribute information of points. For example, if quantization unit 215 and RAHT transform unit 212 are connected, quantization unit 215 may be used to quantize RAHT the residual value of the attribute information of the point output by transform unit 212.
The arithmetic coding unit 216 may entropy-encode the residual value of the point attribute information using zero-run coding (Zero run length coding) to obtain an attribute code stream. The attribute code stream may be bit stream information.
Fig. 4B is a schematic block diagram of a point cloud decoder provided by an embodiment of the present application.
As shown in fig. 4B, the decoder 300 may obtain a point cloud code stream from the encoding apparatus, and obtain location information and attribute information of points in the point cloud by parsing the code. Decoding of the point cloud includes position decoding and attribute decoding.
The position decoding process comprises the steps of performing arithmetic decoding on a geometric code stream, combining after constructing an octree, reconstructing point position information to obtain reconstruction information of the point position information, and performing coordinate transformation on the reconstruction information of the point position information to obtain the point position information. The positional information of the points may also be referred to as geometric information of the points.
The attribute decoding process comprises the steps of obtaining a residual value of attribute information of points in a point cloud through analyzing an attribute code stream, obtaining the residual value of the attribute information of the points after inverse quantization through inverse quantization of the residual value of the attribute information of the points, carrying out point cloud prediction by selecting one of following RAHT inverse transformation and lifting inverse transformation based on reconstruction information of the position information of the points obtained in the position decoding process to obtain a predicted value, adding the predicted value and the residual value to obtain the reconstruction value of the attribute information of the points, and carrying out color space inverse conversion on the reconstruction value of the attribute information of the points to obtain the decoded point cloud.
As shown in fig. 4B, the position decoding may be achieved by:
An arithmetic decoding unit 301, an octree reconstruction (synthesize octree) unit 302, a surface reconstruction unit (Synthesize suface approximation) 303, a geometric reconstruction (Reconstruct geometry) unit 304, a coordinate system inverse transformation (inverse transform coordinates) unit 305, and a prediction tree reconstruction unit 306.
Attribute encoding may be achieved by:
An arithmetic decoding unit 310, an inverse quantization (inverse quantize) unit 311, an inverse RAHT transformation unit 312, a generation LOD (Generate LOD) unit 313, an inverse lifting transform (INVERSE LIFTING) unit 314, and an inverse color transform (inverse trasform colors) unit 315.
It should be noted that decompression is the inverse of compression, and similarly, the functions of the respective units in the decoder 300 can be referred to as the functions of the corresponding units in the encoder 200. In addition, the point cloud decoder 300 may include more, fewer, or different functional components than fig. 4B.
For example, the decoder 300 may divide the point cloud into a plurality of LODs according to the euclidean distance between points in the point cloud, then sequentially decode attribute information of the points in the LODs, for example, calculate the number of zeros (zero_cnt) in the zero-run encoding technique to decode the residual based on the zero_cnt, and then the decoding framework 200 may dequantize the residual based on the decoded residual value and add the dequantized residual value to the predicted value of the current point to obtain a reconstructed value of the point cloud until all the point clouds are decoded. The current point is taken as the nearest point of the follow-up LOD points, and the attribute information of the follow-up points is predicted by using the reconstruction value of the current point.
The foregoing is a basic flow of a point cloud codec under GPCC codec framework, and with the development of technology, some modules or steps of the framework or flow may be optimized, and the present application is applicable to the basic flow of a point cloud codec under GPCC codec framework, but is not limited to the framework and flow.
The following describes octree-based geometric coding and predictive tree-based geometric coding.
The octree-based geometric coding includes first performing coordinate transformation on geometric information so that point clouds are all contained in one bounding box. And then quantization is carried out, the quantization mainly plays a role of scaling, the geometric information of a part of points is the same due to quantization rounding, whether repeated points are removed or not is determined according to parameters, and the processes of quantization and repeated point removal are also called voxel process. Next, the bounding box is continuously tree-divided (octree/quadtree/binary tree) in the order of breadth-first traversal, and the occupation code of each node is encoded. In an implicit geometric partitioning approach, bounding boxes of a point cloud are first computedIt is assumed that the d x>d y>d z bounding box corresponds to a cuboid. When the geometric division is performed, firstly, binary tree division is always performed based on an x axis to obtain two child nodes, when d x=d y>d z condition is met, four child nodes are obtained by always performing quadtree division based on the x and y axes, when d x=d y=d z condition is finally met, eight child tree division is always performed until the leaf nodes obtained by division are 1x1x1 unit cubes, division is stopped, and points in the leaf nodes are coded to generate binary code streams. In the binary tree/quadtree/octree based partitioning, two parameters, K, M, are introduced. Parameter K indicates the maximum number of binary tree/quadtree partitions before performing the octree partition, and parameter M indicates the corresponding minimum block side length of 2 M when performing the binary tree/quadtree partition. At the same time, K and M must satisfy the condition that d max=max(d x,d y,d z),d min=min(d x,d y,d z), parameter K satisfies K > =d max-d min and parameter M satisfies M > =d min. The parameters K and M meet the conditions because the priorities of the dividing modes of the G-PCC in the geometric implicit dividing process are binary tree, quadtree and octree, and when the node block size does not meet the conditions of the binary tree/quadtree, the octree is always divided until the node is divided into the minimum units 1X1X1 of the leaf nodes.
The octree-based geometric information coding mode can effectively code the geometric information of the point cloud by utilizing the correlation between adjacent points in the space, but for some flatter nodes or nodes with plane characteristics, the coding efficiency of the geometric information of the point cloud can be further improved by utilizing plane coding.
Illustratively, as shown in FIG. 5A, (a) the series belongs to the low plane position in the Z-axis direction, and (b) the series belongs to the high plane position in the Z-axis direction. Taking (a) as an example, it can be seen that all four child nodes occupied in the current node are located at the low plane position of the current node in the Z-axis direction, and then the current node can be considered to belong to one Z-plane and be one low plane in the Z-axis direction. Similarly, (b) indicates that the occupied child node in the current node is located at a high plane position of the current node in the Z-axis direction.
The octree coding and plane coding efficiency are compared using (a) as an example. As shown in fig. 5B, the octree coding mode is used for (a) in fig. 1, and then the occupancy information of the current node is expressed as 11001100. However, if the plane coding mode is adopted, firstly, an identifier needs to be coded to indicate that the current node is a plane in the Z-axis direction, and secondly, if the current node is a plane in the Z-axis direction, the plane position of the current node needs to be indicated. Secondly, only the occupation information of the low-plane node in the Z-axis direction (namely, the occupation information of four child nodes of 0246) is needed to be encoded, so that the current node is encoded based on a plane encoding mode, only 6 bits are needed to be encoded, and the representation of 2 bits can be reduced compared with the original octree encoding. Based on this analysis, the planar coding has a significant coding efficiency compared to the octree coding. Therefore, for an occupied node, if the plane coding mode is adopted to code in a certain dimension, as shown in fig. 5C, the plane identifier (planarMode) and the plane position (PlanePos) information of the current node in the dimension need to be represented firstly, and then the occupation information of the current node is coded based on the plane information of the current node. Note that PlaneMode i (i=0, 1, 2) 0 represents that the current node is not a plane in the i-axis direction, planePosition i: 0 represents that the current node is a plane in the i-axis direction when the node is a plane in the i-axis direction, and the plane position is a low plane, and 1 represents that the current node is a high plane in the i-axis direction. Illustratively, i=0 represents the X-axis, i=1 represents the Y-axis, and i=2 represents the Z-axis.
In the current G-PCC standard, a determination is made as to whether a node satisfies a plane coding condition and predictive coding of node plane identification and plane location information when the node satisfies the plane coding condition.
There are 3 kinds of judging conditions for judging whether the node meets the plane coding in the current G-PCC, and the following description is given one by one:
And 1. Judging according to the plane probability of the node in each dimension.
First, the local area density (local_node_density) of the current node is determined, and the probability Prob (i) of the current node in each dimension.
When the local area density of the node is less than the threshold value Th (th=3), the plane probability Prob (i) of the current node in three dimensions is used to compare with the threshold values Th0, th1 and Th2, where Th0< Th1< Th2 (th0=0.6, th1=0.77, th2=0.88). The following uses Eligible i (i=0, 1, 2) to indicate whether to start plane coding in each dimension, where the determination procedure of Eligible i is shown in equation (1), for example, if Eligible i > =threshold indicates that plane coding is started in the i-th dimension:
Eligible i=Prob(i)>=threshold (1)
Note that the threshold is adaptively changed, for example, when Prob (0) > Prob (1) > Prob (2), the threshold is as shown in formula (2):
Eligible 0=Prob(0)>=Th0
Eligible 1=Prob(1)>=Th1
Eligible 2=Prob(2)>=Th2 (2)
The update procedure of local_node_density and the update of Prob (i) are described below.
In one example, prob (i) is updated by the following equation (3):
Prob(i) new=(Lx Prob(i)+δ(coded node))/L+1 (3)
Where l=255, when the coded node is a plane, then it is 1, otherwise it is 0.
In one example, local_node_density is updated by the following equation (4):
local_node_density new=local_node_density+4*numSiblings (4)
Wherein, local_node_density is initialized to 4, numsibles is the number of siblings of the node, as shown in fig. 5D, the current node is the left node, the right node is the sibling of the current node, and then the number of siblings of the current node is 5 (including itself).
And 2, judging whether the current layer node meets plane coding according to the point cloud density of the current layer.
And judging whether to perform plane coding on the node of the current layer by using the density of the points in the current layer. Assuming that the number of points of the current point cloud to be encoded is pointCount, the number of points reconstructed through IDCM encoding is numPointCountRecon, and because octree is encoded based on the breadth-first traversal order, the number of nodes to be encoded of the current layer can be obtained assuming nodeCount, and assuming that planarEligibleKOctreeDepth indicates whether the current layer starts plane encoding or not. Wherein, the judging process of planarEligibleKOctreeDepth is shown in the formula (5):
planarEligibleKOctreeDepth=(pointCount-numPointCountRecon)<nodeCount*1.3 (5)
When planarEligibleKOctreeDepth is true, all nodes in the current layer are subjected to plane coding, otherwise, the plane coding is not performed, and only octree coding is adopted.
And 3. Judging whether the current node meets the plane coding according to the acquisition parameters of the laser radar point cloud.
As shown in fig. 5E, it can be seen that the upper large cube node is traversed by both lasers at the same time, so the current node is not a plane in the Z-axis vertical direction, and the lower small cube node is small enough not to be traversed by both nodes at the same time, so it is possible to be a plane. Therefore, whether the current node meets the plane coding can be judged based on the number of lasers corresponding to the current node.
The predictive coding of plane identification information and plane position information for nodes that satisfy the plane coding conditions at present will be described below.
1. Predictive coding of plane identification information
Three contexts are currently used to encode plane identification information, i.e., the planar representations in each dimension are separated into design contexts.
The encoding of plane position information of the non-laser radar point cloud and the laser radar point cloud is described below.
First) encoding of non-lidar point cloud plane position information
1. Predictive coding of plane position information.
The plane position information is predictive coded based on:
(1) Predicting by using the occupation information of the neighborhood nodes to obtain three elements, namely predicting to be a low plane, predicting to be a high plane and not predicting;
(2) The spatial distances between the nodes and the current node are 'near' and 'far' at the same division depth and the same coordinates as the current node;
(3) The node plane position of the current node under the same dividing depth and the same coordinates;
(4) Coordinate dimension (i=0, 1, 2).
As shown in fig. 5F, if the current node to be encoded is a left node, searching for a neighboring node as a right node under the same octree classification depth level and the same vertical coordinate, determining that the distance between two nodes is "near" and "far", and referencing the plane position of the node.
In one example, as shown in fig. 5G, the black node is the current node, and if the current node is located in the low plane of the parent node, the plane position of the current node is determined by:
a) If any of the child nodes 4 to 7 of the underlined node is occupied and all dot-like nodes are unoccupied, it is highly probable that there is a plane in the current node and the plane is low.
B) If none of the child nodes 4 to 7 of the diagonal node is occupied and any dot-like node is occupied, it is highly probable that there is a plane in the current node and the plane is located higher.
C) If all the child nodes 4 to 7 of the diagonal line node are empty nodes and all the dot-shaped nodes are empty nodes, the plane position cannot be deduced, so the mark is unknown.
If any of the child nodes 4 to 7 of the underlined node is occupied and any of the punctiform nodes is occupied, the plane position cannot be deduced and is therefore marked as unknown.
In another example, as shown in fig. 5H, the black node is the current node, and if the node is at a parent node high plane position, the plane position of the current node is determined by:
a) If any of the sub-nodes 4 to 7 of the punctiform node is occupied and the diagonal node is not occupied, it is highly probable that there is a plane in the current node and the plane position is low.
B) If all the sub-nodes 4-7 of the dot-shaped node are unoccupied, and the diagonal node is occupied, it is highly probable that a plane exists in the current node and the plane position is higher.
C) If all the sub-nodes 4-7 of the punctiform node are unoccupied, and the diagonal nodes are unoccupied, the plane position cannot be inferred, and therefore the label is unknown.
D) If one of the child nodes 4-7 of the punctiform node is occupied and the diagonal node is occupied, the plane position cannot be inferred and is therefore marked as unknown.
Second), coding of laser radar point cloud plane position information
Fig. 5I is a predictive encoding of laser radar point cloud plane location information, where the plane location of the current node is predicted by using laser radar acquisition parameters, and the location is quantized into four intervals by using the location where the current node intersects the laser ray, and finally serves as a context for the plane location of the current node. The specific calculation process is as follows, assuming that the coordinate of the laser radar is (x Lidar,y Lidar,z Lidar), and the geometric coordinate of the current point is (x, y, z), firstly calculating the vertical tangent value tan theta of the current point relative to the laser radar, wherein the calculation process is shown in a formula (6):
Because each Laser has a certain offset angle relative to the Laser radar, the relative tangent value tan theta corr,L of the current node relative to the Laser is calculated, and the specific calculation process is shown in the formula (7):
Finally, the plane position of the current node is predicted by using the corrected tangent value of the current node, which is specifically described as follows, assuming that the tangent value of the lower boundary of the current node is tan (θbottom), the tangent value of the upper boundary is tan (θtop), and the plane position is quantized into 4 quantization intervals according to tan θ corr,L, namely the context of the plane position.
However, the octree-based geometric information coding mode has an efficient compression rate only for points having correlation in space, and for points in isolated positions in geometric space, the complexity can be greatly reduced by using a direct coding mode (Direct Coding Model, abbreviated as DCM). For all nodes in the octree, the use of DCM is not represented by flag bit information, but inferred by current node parent and neighbor information. There are three ways to determine whether the current node qualifies for DCM coding, as shown in fig. 6A:
(1) The current node has no sibling and sister nodes, namely the father node of the current node has only one child node, and meanwhile, the father node of the current node has only two occupied child nodes, namely the current node has at most one neighbor node.
(2) Only one parent node of the current node occupies the child node, and six neighbor nodes sharing one plane with the current node also belong to null nodes.
(3) The number of sibling nodes of the current node is greater than 1.
If the current node does not have DCM coding qualification, the current node is subjected to octree partitioning, if the current node has DCM coding qualification, the point contained in the node is further judged, when the point is smaller than a threshold value 2, the current node is subjected to DCM coding, otherwise, the octree partitioning is continued. When the DCM coding mode is applied, it is first necessary to code whether the current node is a true outlier, i.e. IDCM_flag, when IDCM_flag is true, the current node adopts DCM coding, otherwise octree coding is still adopted. When the current node meets DCM coding, two DCM coding modes exist, namely 1, only one point exists (or a plurality of points but belongs to repeated points), and 2, two points are contained. Finally, the geometric information of each point needs to be encoded, and when the side length of a node is assumed to be 2 d, d bits are needed for encoding each component of the geometric coordinates of the node, and the bit information is directly encoded into a code stream. It should be noted that when the laser radar point cloud is encoded, the coordinate information of three dimensions is predicted and encoded by utilizing the laser radar acquisition parameters, so that the encoding efficiency of the geometric information can be further improved.
The following describes the IDCM encoding process in detail:
When the current node satisfies a Direct Coding Mode (DCM), the number of points numPoints of the current node is coded first, and the number of points of the current node is coded according to different DirectMode, which specifically includes the following steps:
1. if the current node does not meet the DCM node's requirements, then the node exits directly (i.e., the point number is greater than 2 points and is not a duplicate point).
2. The number numPonts of points contained in the current node is less than or equal to 2, and the coding process is as follows:
1) Firstly, whether numPonts of the current node is larger than 1 is encoded;
2) If the current node has only one point and the geometric coding environment is geometric lossless coding, the second point of the current node needs to be coded is not a repetition point.
3. The number numPonts of points contained in the current node is greater than 2, and the coding process is as follows:
1) Firstly, coding numPonts of the current node to be less than or equal to 1;
2) And secondly, encoding whether the second point of the current node is a repeated point or not, and secondly, encoding whether the number of the repeated points of the current node is larger than 1, wherein when the number of the repeated points is larger than 1, exponential golomb decoding is required to be carried out on the number of the remained repeated points.
After the number of points of the current node is encoded, the coordinate information of the points contained in the current node is encoded. The laser radar point cloud and the human eye facing point cloud will be separately described below.
Human eye-oriented point cloud
1) If the current node only contains one point, the geometric information of the three dimension directions of the point is directly encoded (Bypass encoding).
2) If the current node contains two points, the coordinate axis dirextAxis of the preferential encoding is obtained by using the geometric coordinates of the points, and it should be noted that the coordinate axis of the current comparison contains only x and y axes and does not contain the z axis. Assuming that the geometric coordinates of the current node are nodePos, determining the coordinate axis of the preferential encoding in the manner shown in the formula (8):
dirextAxis=!(nodePos[0]<nodePos[1]) (8)
that is, the axis with the small geometrical position of the node coordinates is taken as the coordinate axis dirextAxis of preferential encoding.
Secondly, firstly encoding the geometrical information of the coordinate axis dirextAxis of the preferential encoding according to the following mode, and assuming that the depth of the geometrical bit to be encoded corresponding to the preferential encoding axis is nodeSizeLog, and assuming that the coordinates of two points are pointPos [0] and pointPos [1]:
After the preferential encoding axis dirextAxis is encoded, the geometric coordinates of the current point are directly encoded. Assuming that the remaining coded bit depth of each point is nodeSizeLog a, the specific coding process is as follows:
for(int axisIdx=0;axisIdx<3;++axisIdx)
for(int mask=(1<<nodeSizeLog2[axisIdx])>>1;mask;mask>>1)
encodePosBit(!!(pointPos[axisIdx]&mask));
laser radar-oriented point cloud
1) If the current node contains two points, the coordinate axis dirextAxis of the preferential encoding is obtained by using the geometric coordinates of the points, and if the geometric coordinates of the current node are nodePos, the coordinate axis of the preferential encoding is determined by adopting the mode shown in the formula (9):
dirextAxis=!(nodePos[0]<nodePos[1]) (9)
That is, the axis with the small geometrical position of the node coordinates is taken as the coordinate axis dirextAxis of preferential encoding, and it should be noted that the currently compared coordinate axes only include the x-axis and the y-axis, and do not include the z-axis.
Secondly, firstly encoding the geometrical information of the coordinate axis dirextAxis of the preferential encoding according to the following mode, and assuming that the depth of the geometrical bit to be encoded corresponding to the preferential encoding axis is nodeSizeLog, and assuming that the coordinates of two points are pointPos [0] and pointPos [1]:
after the preferential encoding axis dirextAxis is encoded, the geometric coordinates of the current point are encoded.
The acquisition parameters of the laser radar point cloud can be obtained by the laser radar point cloud, and the geometric coordinate information of the current node can be predicted by utilizing the acquisition parameters, so that the geometric information coding efficiency of the point cloud can be further improved. Similarly, firstly, the geometric information nodePos of the current node is used to obtain a direct encoded main axis direction, and secondly, the geometric information of the direction in which the encoding is completed is used to perform predictive encoding on the geometric information of the other dimension. Also assuming that the axial direction of direct encoding is directAxis and assuming that the bit depth to be encoded in direct encoding is nodeSizeLog2, the encoding manner is as follows:
for(int mask=(1<<nodeSizeLog2)>>1;mask;mask>>1)
encodePosBit(!!(pointPos[directAxis]&mask));
it should be noted here that the geometrical accuracy information of directAxis directions will be encoded in its entirety.
After encoding all the accuracies of the directAxis coordinate directions, laserIdx corresponding to the current point, pointLaserIdx in fig. 6B, is first calculated, and LaserIdx of the current node, nodeLaserIdx, is calculated. Next, the node LaserIdx, nodeLaserIdx, is used to predictively encode the point LaserIdx, pointLaserIdx, where the node or point LaserIdx is calculated as follows:
Assuming that the geometric coordinates of the points are pointPos, the start coordinates of the Laser rays are LidarOrigin, and assuming that the number of lasers is LaserNum, the tangent value of each Laser is tan θ i, and the offset position of each Laser in the vertical direction is Z i, then:
After LaserIdx of the current point is calculated, the current node LaserIdx is used to perform predictive coding on the point pointLaserIdx. And carrying out predictive coding on the geometric information of the current point in three dimensions by utilizing acquisition parameters of the laser radar after LaserIdx of the current point is coded.
As shown in FIG. 6C, the specific algorithm is that first, the predicted value of the corresponding horizontal azimuth angle is obtained by using LaserIdx corresponding to the current point, namelySecondly, obtaining a horizontal azimuth angle corresponding to the node by utilizing the node geometric information corresponding to the current pointWherein the horizontal azimuth angleThe calculation mode of the node geometric information is shown in a formula (10), and the geometric coordinates of the node are assumed to be nodePos:
By utilizing the acquisition parameters of the Laser radar, the rotation point number numPoints of each Laser can be obtained, namely the point number obtained by representing one rotation of each Laser ray, and then the rotation point number of each Laser can be utilized to calculate and obtain the rotation angular velocity deltaPhi of each Laser, as shown in a formula (11):
as shown in fig. 6D, the horizontal azimuth of the node is utilized And the horizontal azimuth angle of the previous coding point of the Laser corresponding to the current pointCalculating to obtain a horizontal azimuth angle predicted value corresponding to the current pointThe specific calculation formula is shown as formula (12):
Finally, as shown in FIG. 6E, by using the predicted value of the azimuth horizon Low plane horizontal azimuth of current nodeAnd the horizontal azimuth of the high planeTo predictively encode the geometric information of the current node. The method is specifically as follows:
After LaserIdx of the points are coded, the Z-axis direction of the current point is predicted and coded by utilizing LaserIdx corresponding to the current point, namely depth information radius of a cylindrical coordinate system is obtained by calculating x and y information of the current point, then tangent value of the current point and cheapness in the vertical direction are obtained by utilizing laser LaserIdx of the current point, and then a predicted value of the Z-axis direction of the current point, namely Z_pred, can be obtained:
Finally, the Z_pred is utilized to carry out predictive coding on the geometric information of the current point in the Z axis direction to obtain a predicted residual error Z_res, and finally, the Z_res is coded.
It is noted that in the case of geometric lossless coding, the number of repetition points in a leaf node needs to be coded when the node is divided into leaf nodes. And finally, encoding the occupation information of all the nodes to generate a binary code stream. In addition, the G-PCC currently introduces a plane coding mode, and in the process of dividing the geometry, whether the child nodes of the current node are in the same plane is judged, and if the child nodes of the current node meet the condition of the same plane, the child nodes of the current node are represented by the plane.
In the geometric decoding based on octree, the decoding end judges whether the current node carries out plane decoding or IDCM decoding by utilizing the geometric information obtained by reconstruction before decoding the occupation information of each node according to the breadth-first traversal order, if the current node meets the plane decoding condition, the plane identification and plane position information of the current node are firstly decoded, and secondly the occupation information of the current node is decoded based on the plane information, if the current node meets the IDCM decoding condition, the decoding end firstly decodes whether the current node is a real IDCM node, if the current node is a real IDCM decoding, the DCM decoding mode of the current node is continuously analyzed, secondly the point number in the current DCM node can be obtained, and finally the geometric information of each point is decoded. For nodes that neither satisfy plane decoding nor DCM decoding, the occupancy information for the current node will be decoded. The occupation code of each node is obtained through continuous analysis in such a way, the nodes are continuously divided in sequence until the unit cubes of 1X1X1 are obtained through division, the points contained in each leaf node are obtained through analysis, and finally the geometric reconstruction point cloud information is recovered.
The following describes the IDCM decoding process in detail:
the same process as the encoding is performed, first, it is determined whether the node starts the IDCM by using a priori information, that is, the start condition of the IDCM is as follows:
(1) The current node has no sibling and sister nodes, namely the father node of the current node has only one child node, and meanwhile, the father node of the current node has only two occupied child nodes, namely the current node has at most one neighbor node.
(2) Only one parent node of the current node occupies the child node, and six neighbor nodes sharing one plane with the current node also belong to null nodes.
(3) The number of sibling nodes of the current node is greater than 1.
When the node meets the condition of DCM coding, firstly, whether the current node is a real DCM node, namely IDCM_flag, and when IDCM_flag is true, the current node adopts DCM coding, otherwise, octree coding is still adopted.
Next, the number of points numPoints of the current node is decoded, and a specific decoding manner is as follows:
1) Firstly decoding whether numPonts of the current node is larger than 1;
2) If the decoding result shows that numPonts of the current node is greater than 1, continuing to decode whether the second point is a repeated point, and if the second point is not the repeated point, implicitly deducing that the second point meeting the DCM mode only contains two points;
3) If the decoding result shows that numPonts of the current node is less than or equal to 1, continuing to decode whether the second point is a repetition point, if the second point is not a repetition point, implicitly deducing that the second point satisfying the DCM mode only contains one point, if the second point is a repetition point, deducing that the third point satisfying the DCM mode contains a plurality of points, but is a repetition point, continuing to decode whether the number of the repetition points is greater than 1 (entropy decoding), and if so, continuing to decode the number of the remaining repetition points (decoding by using exponential golomb).
If the current node does not meet the requirement of the DCM node, i.e. the point number is more than 2 points and is not a duplicate point, the method directly exits.
After the number of points of the current node is decoded, the coordinate information of the points contained in the current node is decoded. The laser radar point cloud and the human eye facing point cloud will be separately described below.
Human eye-oriented point cloud
1) If the current node only contains one point, the geometric information of the three dimension directions of the point is directly decoded (Bypass coding);
2) If the current node contains two points, the coordinate axis dirextAxis to be decoded preferentially is obtained by using the geometric coordinates of the points, and it should be noted that the coordinate axis to be compared at present contains only x and y axes and does not contain z axis. Assuming that the geometric coordinates of the current node are nodePos, determining the coordinate axis of the preferential encoding in the manner shown in the formula (13):
dirextAxis=!(nodePos[0]<nodePos[1]) (13)
That is, the axis with the small geometrical position of the node coordinates is taken as the coordinate axis dirextAxis for preferential decoding.
Secondly, firstly decoding the geometrical information of the coordinate axis dirextAxis which is preferentially decoded according to the following mode, and assuming that the depth of the geometrical bit to be decoded corresponding to the axis which is preferentially decoded is nodeSizeLog and assuming that the coordinates of two points are pointPos [0] and pointPos [1]:
After decoding the preferential decoding axis dirextAxis, the geometric coordinates of the current point are directly decoded. Assuming that the residual coded bit depth of each point is nodeSizeLog < 2 >, the specific decoding process is as follows, assuming that the coordinate information of the point is pointPos:
laser radar-oriented point cloud
1) If the current node contains two points, the coordinate axis dirextAxis of preferential decoding is obtained by using the geometric coordinates of the points, and if the geometric coordinates of the current node are nodePos, the coordinate axis of preferential encoding is determined by adopting the mode shown in the formula (14):
dirextAxis=!(nodePos[0]<nodePos[1]) (14)
That is, the axis with the small geometrical position of the node coordinates is taken as the coordinate axis dirextAxis to be preferentially decoded, and it should be noted that the coordinate axis compared at present only includes the x-axis and the y-axis, and does not include the z-axis.
Secondly, firstly decoding the geometrical information of the coordinate axis dirextAxis which is coded preferentially according to the following mode, and assuming that the depth of the geometrical bit to be coded corresponding to the axis which is coded preferentially is nodeSizeLog and assuming that the coordinates of two points are pointPos [0] and pointPos [1]:
After decoding the priority decoding axis dirextAxis, the geometric coordinates of the current point are decoded.
Likewise, the current node's geometry nodePos is used to obtain a direct decoded principal axis direction, and the decoded direction geometry is used to decode the geometry of the other dimension. Also assuming that the axial direction of direct decoding is directAxis and assuming that the bit depth to be decoded in direct decoding is nodeSizeLog2, the decoding manner is as follows:
it should be noted here that the geometrical accuracy information of directAxis directions is decoded in its entirety.
After decoding all the accuracies of the directAxis coordinate directions, the current node LaserIdx, nodeLaserIdx, is calculated first, and then the node LaserIdx, nodeLaserIdx, is utilized to perform predictive decoding on the point LaserIdx, pointLaserIdx, wherein the node or the point LaserIdx is calculated in the same manner as the encoding end. And finally decoding LaserIdx of the current point and LaserIdx prediction residual information of the node to obtain ResLaserIdx, wherein a calculation formula is shown in a formula 15:
PointLaserIdx=nodeLaserIdx+ResLaserIdx (15)
and after LaserIdx of the current point is decoded, predicting and decoding the geometric information of the three dimensions of the current point by utilizing acquisition parameters of the laser radar.
Specifically, as shown in fig. 6B, a predicted value of the corresponding horizontal azimuth angle is obtained by using LaserIdx corresponding to the current point, that isSecondly, obtaining a horizontal azimuth angle corresponding to the node by utilizing the node geometric information corresponding to the current pointWherein, the geometrical coordinates of the assumed node are nodePos, the horizontal azimuth angleThe calculation mode with the node geometric information is shown in the formula (16):
by utilizing the acquisition parameters of the Laser radar, the rotation point number numPoints of each Laser can be obtained, namely the point number obtained by representing one rotation of each Laser ray, and then the rotation point number of each Laser can be utilized to calculate and obtain the rotation angular velocity deltaPhi of each Laser, as shown in a formula (17):
next, as shown in fig. 6D, the horizontal azimuth angle of the node is utilized And the horizontal azimuth angle of the previous coding point of the Laser corresponding to the current pointCalculating to obtain a horizontal azimuth angle predicted value corresponding to the current pointPredicted value of horizontal azimuth. The calculation mode is shown as a formula (18):
finally by using the predicted value of the azimuth angle Low plane horizontal azimuth of current nodeAnd the horizontal azimuth of the high planeTo predictively encode the geometric information of the current node. The method is specifically as follows:
After LaserIdx of the decoding, the current point is used for predictive decoding in the Z-axis direction of the current point by using LaserIdx corresponding to the current point, that is, depth information radius of the cylindrical coordinate system is currently obtained by calculating by using x and y information of the current point, and then the tangent value of the current point and the offset in the vertical direction are obtained by using laser LaserIdx of the current point, so that a predicted value in the Z-axis direction of the current point, that is, z_pred, can be obtained:
And finally, reconstructing and recovering the geometric information in the Z-axis direction of the current point by utilizing the decoded Z-res and Z-pred.
In a geometric information coding frame based on trisoup (triangulated set), geometric division is also performed first, but unlike binary tree/quadtree/octree-based geometric information coding, the method does not need to divide a point cloud step by step into unit cubes with side length of 1x1x1, but stops dividing when the side length of a block (sub-block) is W, and obtains at most twelve vertexes (intersecting points) generated by the surface and twelve sides of the block based on a surface formed by distribution of point clouds in each block. And (5) encoding the vertex coordinates of each block in turn to generate a binary code stream.
And on the basis of trisoup point cloud geometric information reconstruction, when the decoding end performs point cloud geometric information reconstruction, firstly decoding the vertex coordinates for completing triangular patch reconstruction, wherein the process is shown in fig. 7A to 7C. In the block shown in fig. 7A, there are 3 vertexes (v 1, v2, v 3), and a triangular patch set constituted by using these 3 vertexes in a certain order is called triangule so, that is trisoup, as shown in fig. 7B. Then, sampling is performed on the triangular patch set, and the obtained sampling points are used as reconstruction point clouds in the block, as shown in fig. 7C.
The geometric coding based on the prediction tree comprises the steps of firstly sorting the input point clouds, wherein the currently adopted sorting method comprises unordered, morton order, azimuth order and radial distance order. And at the encoding end, a prediction Tree structure is built by using two different modes, wherein the prediction Tree structure comprises KD-Tree (high-speed slow-down speed mode) and Laser radar calibration information, each point is divided into different lasers, and the prediction structure is built according to the different lasers (low-delay fast mode). And traversing each node in the prediction tree based on the structure of the prediction tree, predicting the geometric position information of the node by selecting different prediction modes to obtain a prediction residual, and quantizing the geometric prediction residual by using quantization parameters. Finally, through continuous iteration, the prediction residual error, the prediction tree structure, the quantization parameter and the like of the position information of the prediction tree node are encoded, and a binary code stream is generated.
Based on the geometric decoding of the prediction tree, the decoding end reconstructs the prediction tree structure by continuously analyzing the code stream, obtains the geometric position prediction residual information and the quantization parameter of each prediction node by analyzing, dequantizes the prediction residual, recovers to obtain the reconstructed geometric position information of each node, and finally completes the geometric reconstruction of the decoding end.
And after the geometric coding is finished, reconstructing the geometric information. Currently, attribute coding is mainly performed for color information. First, color information is converted from an RGB color space to a YUV color space. The point cloud is then recoloured with the reconstructed geometric information such that the uncoded attribute information corresponds to the reconstructed geometric information. In color information coding, there are two main transformation methods, one is distance-based lifting transformation depending on LOD (Level of Detail) division, and the other is direct RAHT (Region ADAPTIVE HIERARCHAL Transform, region adaptive hierarchical transformation) transformation, and both the two methods convert color information from a spatial domain to a frequency domain, obtain a high-frequency coefficient and a low-frequency coefficient through transformation, and finally quantize and encode the coefficients to generate a binary code stream.
When the geometric information is utilized to predict the attribute information, the Morton code can be utilized to perform nearest neighbor search, and the Morton code corresponding to each point in the point cloud can be obtained from the geometric coordinates of the point. The specific method of calculating the morton code is described below as a three-dimensional coordinate represented by a d-bit binary number for each component, three of which can be represented as formula (19):
Wherein, the The most significant bits of x, y, z, respectivelyTo the lowest positionCorresponding binary values. Morton code M is a code of sequentially crossing x, y, z from the most significant bitThe calculation formula of M is shown in the following formula (20) from the lowest order:
Wherein, the Respectively the most significant bits of MTo the lowest positionIs a value of (2). After the Morton code M of each point in the point cloud is obtained, the points in the point cloud are arranged according to the sequence from small to large, and the weight w of each point is set to be 1.
GPCC general test conditions total 4:
the geometric position is limited and the attribute is limited;
Condition 2, geometric position is lossless and attribute is lossy;
The geometric position is lossless, and the attribute is limited and damaged;
And 4, geometric position is lossless and attribute is lossless.
The universal test sequence comprises four types of Cat1A, cat1B, cat3-fused and Cat3-frame, wherein the Cat2-frame point cloud only comprises reflectivity attribute information, the Cat1A, cat B point cloud only comprises color attribute information, and the Cat3-fused point cloud simultaneously comprises color and reflectivity attribute information.
GPCC are divided into an octree coding branch and a prediction tree coding branch by using an algorithm adopted by geometric compression.
In the octree coding branch, at the coding end, bounding boxes are sequentially divided to obtain subcubes, the subcubes which are not empty (contain points in point cloud) are continuously divided until the division is stopped when the leaf node obtained by the division is a unit cube of 1X1X1, under the geometric lossless coding condition, the points contained in the leaf node are required to be coded, and finally, the coding of the geometric octree is completed, so that a binary code stream is generated. At the decoding end, the decoding end obtains the occupation code of each node through continuous analysis according to the breadth-first traversal sequence, and continuously divides the nodes in sequence until the unit cubes of 1x1x1 are obtained by division, and under the condition of geometric lossless decoding, the points contained in each leaf node need to be obtained by analysis, and finally the geometric reconstruction point cloud information is recovered.
In the prediction Tree coding branch, a prediction Tree structure is built at a coding end by using two different modes, wherein the two modes comprise KD-Tree (high-speed slow-down speed mode) and Laser radar calibration information, each point is divided into different lasers, and a prediction structure is built according to the different lasers (low-delay fast mode). And traversing each node in the prediction tree based on the structure of the prediction tree, predicting the geometric position information of the node by selecting different prediction modes to obtain a prediction residual, and quantizing the geometric prediction residual by using quantization parameters. Finally, through continuous iteration, the prediction residual error, the prediction tree structure, the quantization parameter and the like of the position information of the prediction tree node are encoded, and a binary code stream is generated. At the decoding end, the decoding end reconstructs the prediction tree structure by continuously analyzing the code stream, obtains the geometric position prediction residual information and the quantization parameter of each prediction node by analyzing, dequantizes the prediction residual, recovers to obtain the reconstructed geometric position information of each node, and finally completes the geometric reconstruction of the decoding end.
The AVS codec framework is described below.
In the point cloud AVS encoder framework, the geometric information of the point cloud and the attribute information corresponding to each point are separately encoded.
Fig. 8A is a schematic diagram of an encoding framework of AVS, and fig. 8B is a schematic diagram of a decoding framework of AVS. As shown in fig. 8A, first, the geometric information is subjected to coordinate transformation so that the point clouds are all contained in one bounding box. Before the preprocessing process, it is determined whether to divide the whole point cloud sequence into multiple slices (point cloud slices) according to parameter configuration, and each divided slice is treated as a single independent point cloud serial process. The preprocessing process involves quantizing and removing the repetition points. Quantization mainly plays a role of scaling, and as the quantization rounding makes the geometric information of a part of points identical, whether to remove the repeated points is determined according to parameters. Next, the binding box is divided (octree/quadtree/binary tree) in order of breadth-first traversal, and the occupation code of each node is encoded.
As shown in fig. 8B, in the octree-based geometric coding framework, bounding boxes are sequentially divided to obtain subcubes, the subcubes which are not empty (including points in the point cloud) are continuously divided until division is stopped when a leaf node obtained by division is a unit cube of 1x1x1, then, in the case of geometric lossless coding, points included in the leaf node are coded, and finally, the coding of the geometric octree is completed, and a binary code stream is generated. In the geometric decoding process based on octree, the decoding end obtains the occupation code of each node through continuous analysis according to the breadth-first traversal sequence, and continuously divides the nodes in sequence until the unit cubes of 1x1x1 are obtained by division, the division is stopped, the points contained in each leaf node are obtained through analysis, and finally the geometric reconstruction point cloud information is recovered.
In the current AVS geometry coding, there are two coding modes, one is octree coding and the other is prediction tree coding.
Octree coding if octree coding is employed, there are two context coding models, context model one for cat1-A and cat2 point cloud sequences and context model two for cat1-B and cat3 sequences.
The following description is given of the first model.
The following model one comprises the sub-layer neighbor prediction of the current point and the neighbor prediction of the current point layer.
1) Sublayer neighbor prediction for current point
In the octree breadth-first traversal dividing mode, the neighbor information which can be obtained when the child node of the current point is encoded comprises neighbor child nodes in the left front lower three directions. The context model of the sub-node layer is designed by searching for the occupation situation of the node at the side of the two nodes of the current sub-node to be encoded, wherein the node is 3 coplanar, 3 collinear, 1 co-node and the node is the shortest in side length in the negative direction in the front lower left direction of the same layer as the sub-node to be encoded. Taking the shortest node side length in the X dimension as an example, the reference node selected by each child node is shown in fig. 9A. The nodes of the dotted line frame are current nodes, the gray nodes are current sub-nodes to be coded, and the nodes of the solid line frame are reference nodes selected by all the sub-nodes.
The occupation situation of the negative direction distance from the node at the edge of the two nodes of the current sub-node to be coded in the dimension of the shortest node edge of the 3 coplanar nodes and the 3 collinear nodes is considered in detail, and the occupation situation of the 7 nodes is 2 7 =128. If the nodes are not occupied, 2 7 -1=127 cases are totally occupied, 1 context is allocated for each case, and if the 7 nodes are not occupied, the occupation situation of the co-node neighbor nodes is considered. The co-sited neighbor has 2 possibilities, occupied or unoccupied. And (3) independently distributing 1 context for the occupied situation of the co-node neighbor node, and if the co-node neighbor node is unoccupied, considering the occupied situation of the neighbor of the current node layer, which is described below. I.e., the sub-node layer neighbors to be encoded correspond to 127+2-1=128 contexts in total.
2) Neighbor prediction for current node layer
If none of the 8 peer reference nodes of the child node to be encoded is occupied, consider the occupancy of four sets of neighbors of the current node layer as shown in FIG. 9B. The dotted line frame node is the current node, and the solid line frame is the neighbor node.
For the current node layer, the context is determined according to the following steps:
1. Consider first the top right 3 co-planar neighbors of the current node. The occupation situation of the 3 neighbors which are coplanar at the right upper part of the current node is 2 3 =8, one context is allocated to each unoccupied situation, and the position of the child node to be encoded at the current node is considered, so that the group of neighbor nodes provides (8-1) x 8=56 contexts in total. If none of the last right 3 co-planar neighbors of the current point is occupied, then the remaining three sets of neighbors of the current node layer continue to be considered
2. Consider the distance of the most recently occupied node from the current node.
The specific correspondence between the distribution of neighboring nodes and the distance is shown in table 2.
TABLE 2 correspondence between current node layer occupancy and distance
| Current node layer occupancy | Distance of |
| Left front lower coplanar neighbor occupancy or right upper rear collinear neighbor occupancy | 1 |
| The lower left co-planar neighbor, the upper right co-linear neighbor, and the lower left co-linear neighbor | 2 |
| None of the four neighbors of the current node layer is occupied | 3 |
As can be seen from table 2, the distances total 3. And allocating 1 context for each of the 3 value cases, and considering the position condition of the child node to be encoded at the current node, wherein the total number of the contexts is 3 multiplied by 8=24.
To this end, the present set of context models has a total of 128+56+24=208 contexts allocated.
Context model two is described below.
The method uses a dual layer context reference relationship configuration, as shown in equation (21), where the first layer is the occupancy of a coded neighboring block with the parent node of the current sub-block to be coded (i.e., ctxIdxParent), and the second layer is the occupancy of a neighboring coded block at the same depth as the current sub-block to be coded (i.e., ctxIdxChild).
First, for each sub-block to be encoded, ctxIdxChild of the second layer is shown as formula (22), C i 1 represents the current sub-blockOccupancy of 3 encoded sub-blocks at a distance of 1.
idx=LUT[ctxIdxParent][ctxIdxChild] (21)
Next, ctxIdxParent of the first layer, for the relative positions of different sub-blocks, searches for adjacent parent blocks coplanar and collinear with the parent blocks in a table look-up manner, and calculates ctxIdxParent according to equation (23) according to the occupation condition. FIG. 9C is a schematic diagram of sub-blocks corresponding to 6 adjacent parent blocks, respectively, as shown in FIG. 9C, each sub-block showing the relative positional relationship of 6 adjacent parent blocks found by the ith sub-block, wherein the sub-blocks comprise 3 coplanar parent blocks (P i,0,P i,1,P i,2) and 3 collinear parent blocks (P i,3,P i,4,P i,5). Each sub-block and adjacent parent block positional relationship is obtained by means of table 2, the numbers in table 2 corresponding to the morton numbers in fig. 6, taking into account the different sub-block positions and the geometric central rotational symmetry. Fig. 9D is a schematic diagram of 18 neighboring blocks around the current block to be encoded and the morton sequence numbers thereof, and it can be seen from fig. 9D that, with the current block as the center, the method has a larger receptive field, and a maximum of 18 neighboring parent blocks around the current block to be encoded can be utilized. The mode adopted in the formula (3) is that the arrangement combination of the occupation situations of 3 coplanar father blocks and the sum of the number of the occupation situations of 3 collinear father blocks are adopted.
Therefore, the number of contexts used in this method is at most 2 3×2 5 =256.
Table 3 is the relationship of child block i to its adjacent parent block j, with the numbers in the table corresponding to the Morton order number in FIG. 9D.
TABLE 3 Table 3
And (3) predictive Tree coding, namely if predictive Tree coding is adopted, firstly, morton code ordering is carried out at a coding end by utilizing the geometric information of the point cloud, secondly, predictive coding is carried out on the geometric information of the point cloud by utilizing KD-Tree, and similar to a single-chain structure, the geometric information of the child nodes is predictive coded by utilizing the father node. As shown in FIG. 9E, the prediction tree adopts a single-chain structure in which each tree node has only one child node, except for a unique leaf node. In addition to the root node being predicted by default, other nodes are provided with geometric predictions by their parent nodes.
In the process of performing multi-way tree geometric coding, the isolated point direct coding mode is validated when the current block simultaneously meets the following three conditions:
1. the isolated point direct coding mode identifier in the geometric header information is 1;
2. Only one point cloud data point is contained in the current block;
3. The sum of the number of morton code bits to be encoded for points within the current block is greater than twice the number of directions that do not reach the minimum side length.
The branch is entered when all three conditions are met. A flag identification bit is introduced to indicate whether the current node uses an isolated point coding mode, and the flag is entropy coded by using a context. If the flag is True, the isolated point mode is used to directly encode the geometric coordinates of the point, and the octree division is finished. If the flag is False, the occupied code is encoded and octree partitioning is continued.
In certain cases, the flag may be inferred as False without encoding. If the parent block of the current block already can allow the use of the outlier coding mode and the current block is the only child node of the parent block, then the outlier must not be included in the current block. Therefore, bits encoding the flag can be omitted under this condition.
After the flag identification bit is encoded, since only one point cloud point is contained in the current block, the unencoded bit of the Morton code corresponding to the geometric coordinates of the point cloud point is directly encoded. The specific coding process is as follows:
Assuming that the residual coded bit depth of the point is nodeSizeLog, the specific coding process is as follows:
for(int axisIdx=0;axisIdx<3;++axisIdx)
for(int mask=(1<<nodeSizeLog2[axisIdx])>>1;mask;mask>>1)
encodePosBit(!!(pointPos[axisIdx]&mask));
And after the geometric coding is finished, reconstructing the geometric information. Currently, attribute coding is mainly performed for color and reflectivity information. As shown in fig. 8A, the encoding end first determines whether to perform color space conversion, and if so, converts color information from RGB color space to YUV color space. Then, the point cloud is built using the original points Yun Duichong for re-coloring so that the uncoded attribute information corresponds to the reconstructed geometric information. The color information coding is divided into two modules, namely attribute prediction and attribute transformation. The attribute prediction process is as follows, the point cloud is reordered first, and then differential prediction is performed. There are two methods of reordering, morton reordering and Hilbert reordering. Hilbert reordering is performed on cat1A and cat2 sequences, and Morton reordering is performed on cat1B and cat3 sequences. And performing attribute prediction on the ordered point clouds by using a differential mode, and finally quantizing and entropy coding the prediction residual errors to generate binary code streams. The attribute transformation process comprises the steps of firstly carrying out wavelet transformation on the point cloud attribute to quantize a transformation coefficient, secondly obtaining an attribute reconstruction value through inverse quantization and inverse wavelet transformation, then calculating the difference between the original attribute and the attribute reconstruction value to obtain an attribute residual error and quantizing the attribute residual error, and finally carrying out entropy coding on the quantized transformation coefficient and the attribute residual error to generate a binary code stream.
General test conditions for AVS PCC are described below.
General test conditions for AVS total 4:
the geometric position is limited and the attribute is limited;
Condition 2, geometric position is lossless and attribute is lossy;
The geometric position is lossless, and the attribute is limited and damaged;
And 4, geometric position is lossless and attribute is lossless.
The universal test sequence comprises five types of Cat1A, cat1B, cat1C, cat2-frame and Cat3, wherein the Cat1A, cat-frame point cloud only comprises reflectivity attribute information, the Cat1B, cat point cloud only comprises color attribute information, and the Cat1B point cloud simultaneously comprises color and reflectivity attribute information.
The technical routes are 2, and the technical routes are distinguished by an algorithm adopted by attribute compression.
Technical route 1 prediction branching, attribute compression adopts intra prediction based method:
At the encoding end, processing points in the point cloud according to a certain sequence (such as a point cloud original acquisition sequence, a Morton sequence, a Hilbert sequence and the like), firstly obtaining an attribute predicted value by adopting a prediction algorithm, and then quantizing the attribute residual according to the attribute value and the attribute predicted value to generate a quantized residual, and finally encoding the quantized residual;
At the decoding end, processing points in the point cloud according to a certain sequence (original acquisition sequence of the point cloud, morton sequence, hilbert sequence and the like), firstly adopting a prediction algorithm to obtain an attribute predicted value, then decoding to obtain a quantized residual, then performing inverse quantization on the quantized residual, and finally obtaining an attribute reconstruction value according to the attribute predicted value and the inverse quantized residual.
The technical route 2 is that the prediction transformation branch is limited in resources, the attribute compression adopts a method based on intra-frame prediction and DCT transformation, and when the quantized transformation coefficient is encoded, the maximum point number X (4096, for example) is limited, namely, at most, each X point is a group for encoding:
at the encoding end, processing points in the point cloud according to a certain sequence (such as a point cloud original acquisition sequence, a Morton sequence, a Hilbert sequence and the like), dividing the whole point cloud into a plurality of subgroups with the maximum length of Y (such as 2), combining the subgroups into a plurality of groups (the number of points in each group is not more than X, such as 4096), obtaining an attribute predicted value by adopting a prediction algorithm, obtaining an attribute residual according to the attribute value and the attribute predicted value, performing DCT (discrete cosine transform) on the attribute residual by taking the subgroups as a unit to generate a transformation coefficient, quantizing the transformation coefficient to generate a quantized transformation coefficient, and finally encoding the quantized transformation coefficient by taking the groups as a unit;
At the decoding end, the points in the point cloud are processed according to a certain sequence (such as the original acquisition sequence of the point cloud, the Morton sequence, the Hilbert sequence and the like), the whole point cloud is divided into a plurality of subgroups with the maximum length of Y (such as 2), the subgroups are combined into a plurality of groups (the points in each group are not more than X, such as 4096), the quantized transformation coefficients are decoded and obtained by taking the groups as units, the attribute predicted values are obtained by adopting a prediction algorithm, the quantized transformation coefficients are inversely quantized and inversely transformed by taking the subgroups as units, and finally the attribute reconstruction values are obtained according to the attribute predicted values and the inversely quantized and inversely transformed coefficients.
The technical route 3 is that the prediction transformation branch-resource is not limited, the attribute compression adopts a method based on intra-frame prediction and DCT transformation, and when the quantized transformation coefficient is encoded, the limitation of the maximum point number X is avoided, namely, all the coefficients are encoded together:
At the encoding end, processing points in the point cloud according to a certain sequence (such as a point cloud original acquisition sequence, a Morton sequence, a Hilbert sequence and the like), firstly dividing the whole point cloud into a plurality of subgroups with the maximum length of Y (such as 2), then obtaining an attribute prediction value by adopting a prediction algorithm, obtaining an attribute residual according to the attribute value and the attribute prediction value, performing DCT (discrete cosine transform) on the attribute residual by taking the subgroups as a unit to generate a transformation coefficient, quantizing the transformation coefficient to generate a quantized transformation coefficient, and finally encoding the quantized transformation coefficient of the whole point cloud;
At the decoding end, the points in the point cloud are processed according to a certain sequence (such as the original acquisition sequence of the point cloud, the Morton sequence, the Hilbert sequence and the like), the whole point cloud is firstly divided into a plurality of groups with the maximum length of Y (such as 2), quantized transformation coefficients of the whole point cloud are obtained through decoding, then an attribute predicted value is obtained through a prediction algorithm, the quantized transformation coefficients are subjected to inverse quantization and inverse transformation by taking the groups as units, and finally an attribute reconstruction value is obtained according to the attribute predicted value and the coefficients subjected to inverse quantization and inverse transformation.
Technical route 4, multi-layer transformation branch, attribute compression adopts a multi-layer wavelet transformation-based method:
At the encoding end, carrying out multi-layer wavelet transformation on the whole point cloud to generate a transformation coefficient, quantizing the transformation coefficient to generate a quantized transformation coefficient, and finally encoding the quantized transformation coefficient of the whole point cloud;
at the decoding end, the quantized transformation coefficient of the whole point cloud is obtained through decoding, and then the quantized transformation coefficient is subjected to inverse quantization and inverse transformation, so that an attribute reconstruction value is obtained.
When the current node is directly coded, the coding end codes the position information of the point in the current node after determining that the current node has the qualification of direct coding and decoding. However, when the geometric information of the point in the current node is coded at present, the inter-frame information is not considered, so that the coding and decoding performance of the point cloud is reduced.
In order to solve the technical problem, when the current node in the current decoding frame is coded, N predicted nodes of the current node are determined in a predicted reference frame of the frame to be coded, and based on geometric coding and decoding information of points in the N predicted nodes, the embodiment of the application carries out prediction coding and decoding on coordinate information of points in the current node. That is, the embodiment of the application optimizes the nodes when DCM direct encoding and decoding is performed, predicts and decodes the geometric information of the midpoint of the IDCM node (i.e. the current node) of the to-be-node by utilizing the geometric information of the predicted node in the predicted reference frame by considering the correlation in the time domain between adjacent frames, and further improves the geometric information encoding and decoding efficiency of the point cloud by considering the correlation in the time domain between adjacent frames.
The following describes a point cloud encoding and decoding method according to an embodiment of the present application with reference to a specific embodiment.
Firstly, taking a decoding end as an example, the point cloud decoding method provided by the embodiment of the application is introduced.
Fig. 10 is a flowchart of a point cloud decoding method according to an embodiment of the present application. The point cloud decoding method of the embodiment of the application can be completed by the point cloud decoding device or the point cloud decoder shown in fig. 3 or fig. 4B or 8B.
As shown in fig. 10, the point cloud decoding method according to the embodiment of the present application includes:
s101, determining N prediction nodes of a current node in a prediction reference frame of a current frame to be decoded.
The current node is a node to be decoded in the current frame to be decoded.
As can be seen from the above, the point cloud includes geometric information and attribute information, and decoding of the point cloud includes geometric decoding and attribute decoding. The embodiment of the application relates to geometric decoding of point clouds.
In some embodiments, the geometric information of the point cloud is also referred to as position information of the point cloud, and thus, the geometric decoding of the point cloud is also referred to as position decoding of the point cloud.
In the octree-based encoding method, the encoding end constructs an octree structure of the point cloud based on geometric information of the point cloud, and as shown in fig. 11, uses a minimum cuboid surrounding point cloud, first performs octree division on the bounding box to obtain 8 nodes, and continues octree division on occupied nodes, i.e., nodes including points, among the 8 nodes until division to voxel level positions, for example, to cubes of 1X 1. The point cloud octree structure obtained by dividing comprises a plurality of layers of nodes, for example, N layers, and when in coding, the occupation information of each layer is coded layer by layer until the leaf nodes of the voxel level of the last layer are coded. That is, in octree coding, the point cloud is divided by octree, and finally, points in the point cloud are divided into leaf nodes of voxel levels of the octree, and the coding of the point cloud is achieved by coding the entire octree.
Correspondingly, the decoding end firstly decodes the geometric code stream of the point cloud to obtain the occupation information of the root node of the octree of the point cloud, and determines the child node included in the root node, namely the node included in the layer 2 of the octree based on the occupation information of the root node. And then decoding the geometric code stream to obtain the occupation information of each node in the layer 2, determining the node included in the layer 3 of the octree based on the occupation information of each node, and so on.
However, the octree-based geometric information coding mode has high compression rate on the points with correlation in the space, and for the points at isolated positions in the geometric space, the complexity can be greatly reduced and the coding and decoding efficiency can be improved by using a direct coding mode.
The direct coding mode directly codes the geometric information of the points included in the nodes, and if the points included in the nodes are more, the compression effect is poor when the direct coding mode is adopted. Therefore, for a node in the octree, it is first determined whether the node can adopt a direct coding scheme before performing direct coding. And if the node is judged to be capable of adopting the direct coding mode to code, adopting the direct coding mode to directly code the geometric information of the point included in the node. If the node is judged to be not capable of being coded by adopting the direct coding mode, the node is divided by adopting the octree mode continuously.
Specifically, the encoding end firstly judges whether the node has the qualification of direct encoding, if the node has the qualification of direct encoding, judges whether the point number of the node is smaller than or equal to a preset threshold value, and if the point number of the node is smaller than or equal to the preset threshold value, determines that the node can be decoded by adopting a direct encoding mode. Then, the number of points included in the node and the geometric information of the points are encoded into the code stream. Correspondingly, after determining that the node has the qualification of direct decoding, the decoding end decodes the code stream to obtain the point number of the node and the geometric information of each point, thereby realizing the geometric decoding of the node.
At present, when the position information of the point in the current node is predictive coded, the inter-frame information is not considered, so that the coding performance of the point cloud is low.
In order to solve the above problems, in the embodiment of the present application, a decoding end predicts and decodes the position information of the point in the current node based on the inter-frame information corresponding to the current node, thereby improving the decoding efficiency and decoding performance of the point cloud.
Specifically, the decoding end first determines N prediction nodes of the current node in the prediction reference frame of the current frame to be decoded.
It should be noted that, the current frame to be decoded is a point cloud frame, and in some embodiments, the current frame to be decoded is also referred to as a current frame or a current point cloud frame to be decoded. The current node may be understood as a non-null node of any non-leaf node in the current frame to be decoded. That is, the current node is not a leaf node in the octree corresponding to the current frame to be decoded, i.e., the current node is an intermediate arbitrary node of the octree, and is not a non-null node, i.e., includes at least 1 point.
In the embodiment of the application, when a decoding end decodes a current node in a current frame to be decoded, firstly determining a prediction reference frame of the current frame to be decoded, and determining N prediction nodes of the current node in the prediction reference frame. For example, fig. 12 shows a predicted node of the current node in the predicted reference frame.
It should be noted that, in the embodiment of the present application, the number of prediction reference frames of the current frame to be decoded is not limited, for example, the current frame to be decoded has one prediction reference frame, or the current frame to be decoded has a plurality of prediction reference frames. Meanwhile, the embodiment of the application does not limit the number N of the predicted nodes of the current node, and is specifically determined according to actual needs.
The embodiment of the application also does not limit the specific mode of determining the prediction reference frame of the current frame to be decoded.
In some embodiments, the previous frame or frames of the current frame to be decoded are determined as prediction reference frames for the current frame to be decoded.
For example, if the current frame to be decoded is a P frame, the P frame includes a frame previous to the P frame (i.e., a forward frame) in the inter-frame reference frame, and thus, the frame previous to the current frame to be decoded (i.e., the forward frame) may be determined as the prediction reference frame of the current frame to be decoded.
For another example, if the current frame to be decoded is a B frame, the inter-frame reference frame of the B frame includes a previous frame (i.e., a forward frame) of the P frame and a next frame (i.e., a backward frame) of the P frame, and thus, the previous frame (i.e., the forward frame) of the current frame to be decoded can be determined as the prediction reference frame of the current frame to be decoded.
In some embodiments, the next or next several decoded frames of the current frame to be decoded are determined as prediction reference frames for the current frame to be decoded.
For example, if the current frame to be decoded is a B frame, a frame subsequent to the current frame to be decoded may be determined as a prediction reference frame of the current frame to be decoded.
In some embodiments, a previous decoded frame or frames of the current frame to be decoded and a next decoded frame or frames of the current frame to be decoded are determined as prediction reference frames for the current frame to be decoded.
For example, if the current frame to be decoded is a B frame, a previous frame and a next frame of the current frame to be decoded may be determined as prediction reference frames of the current frame to be decoded, where the current frame to be decoded has 2 prediction reference frames.
The specific process of determining N prediction nodes of the current node in the prediction reference frames of the current frame to be decoded in S101-a is described below by taking the example that the current frame to be decoded includes K prediction reference frames.
In some embodiments, the decoding end selects at least one prediction reference frame from the K prediction reference frames based on the occupation information of the node in the current frame to be decoded and the occupation information of the node in each of the K prediction reference frames, and searches for the prediction node of the current node in the at least one prediction reference frame. For example, at least one prediction reference frame, the occupation information of which is closest to the occupation information of the node of the current frame to be decoded, is selected from the K prediction reference frames, and then the prediction node of the current node is searched in the at least one prediction reference frame.
In some embodiments, the decoding side may determine N predicted nodes of the current node by the following steps S101-A1 and S101-A2:
S101-A1, determining at least one prediction node of a current node in a kth prediction reference frame aiming at the kth prediction reference frame in K prediction reference frames, wherein K is a positive integer less than or equal to K, and K is a positive integer;
S101-A2, determining N prediction nodes of the current node based on at least one prediction node of the current node in K prediction reference frames.
In this embodiment, the decoding end determines at least one prediction node of the current node from each of the K prediction reference frames, and finally summarizes the at least one prediction node of each of the K prediction reference frames to obtain N prediction nodes of the current node.
The process of determining at least one predicted point of the current node in each of the K predicted reference frames by the decoding end is the same, and for convenience of description, a kth predicted reference frame of the K predicted reference frames is taken as an example.
The following describes a specific procedure for determining at least one predicted node of the current node in the kth predicted reference frame in S101-A1.
The embodiment of the application does not limit the specific mode of determining at least one predicted node of the current node in the kth predicted reference frame by the decoding end.
In one mode, a predicted node of the current node is determined in the kth predicted reference frame. For example, one node of the kth prediction reference frame, which is the same as the partition depth of the current node, is determined as the prediction node of the current node.
By way of example, assuming that the current node is located at level 3 of the octree of the current frame to be decoded, each node located at level 3 of the octree in the kth prediction reference frame may be obtained, and from these nodes, the prediction node of the current node may be determined.
In one example, if the number of the predicted nodes in the kth predicted reference frame is 1, one node with the smallest difference between the occupancy information and the occupancy information of the current node may be selected from the points where the kth predicted reference frame and the current node are at the same division depth, and the node 1 is identified as a predicted node of the current node in the kth predicted reference frame.
In another example, if the number of predicted nodes of the current node in the kth predicted reference frame is greater than 1, the above-determined node 1, and at least one domain node of the node 1 in the kth predicted reference frame, for example, at least one domain node coplanar, collinear, co-dotted, etc., with the node 1, are determined as the predicted nodes of the current node in the kth predicted reference frame.
In a second mode, the step of determining at least one prediction node of the current node in the kth prediction reference frame in S101-A1 includes the following steps S101-a11 to S101-a 13:
S101-A11, determining M field nodes of a current node in a current frame to be decoded, wherein the M field nodes comprise the current node, and M is a positive integer;
S101-A12, determining a corresponding node of an ith field node in a kth prediction reference frame aiming at the ith field node in M field nodes, wherein i is a positive integer less than or equal to M;
S101-A13, determining at least one prediction node of the current node in the kth prediction reference frame based on corresponding nodes of M field nodes in the kth prediction reference frame.
In this implementation, the decoding end first determines M field nodes of the current node in the current frame to be decoded, before determining at least one prediction node of the current node in the kth prediction reference frame, where the M field nodes include the current node itself.
It should be noted that, in the embodiment of the present application, a specific determination manner of M domain nodes of the current node is not limited.
In one example, the M domain nodes of the current node include at least one of the domain nodes that are coplanar, collinear, and co-located with the current node in the current frame to be decoded. As shown in fig. 13, the current node includes 6 coplanar nodes, 12 collinear nodes, and 8 co-point nodes.
In another example, in addition to at least one of the field nodes that is coplanar, collinear, and co-point with the current node, among the M field nodes of the current node, other nodes within the reference neighborhood may be included in the current frame to be decoded, which is not limited by the embodiment of the present application.
The decoding end determines corresponding nodes of each field node in the k-th prediction reference frame after determining M field nodes of the current node in the current frame to be decoded based on the steps, and further determines at least one prediction node of the current node in the k-th prediction reference frame based on the corresponding nodes of the M field nodes in the k-th prediction reference frame.
The embodiment of the application does not limit the specific implementation mode of S101-A13.
In one possible implementation manner, at least one corresponding node is screened from corresponding nodes in the kth prediction reference frame in the M field nodes and is used as at least one prediction node of the current node in the kth prediction reference frame. For example, at least one corresponding node with the smallest difference between the occupation information of the M field nodes and the occupation information of the current node is screened out from the corresponding nodes in the kth prediction reference frame and is used as at least one prediction node of the current node in the kth prediction reference frame. The method of determining the difference between the occupancy information of the corresponding node and the occupancy information of the current node may refer to the determination process of the difference between the occupancy information of the corresponding node and the occupancy information of the current node, for example, an exclusive or operation is performed on the occupancy information of the corresponding node and the occupancy information of the current node, and the exclusive or operation result is used as the difference between the occupancy information of the corresponding node and the occupancy information of the current node.
In another possible implementation manner, the decoding end determines a corresponding node of the M field nodes in the kth prediction reference frame as at least one prediction node of the current node in the kth prediction reference frame. For example, the M field nodes have one corresponding node in the kth prediction reference frame, and further have M corresponding nodes, and the M corresponding nodes are determined as prediction nodes of the current node in the kth prediction reference frame, and there are M prediction nodes in total.
The above description is directed to a process of determining at least one predicted node of a current node in a kth predicted reference frame. In this way, the decoding end may determine at least one prediction node of the current node in each of the K prediction reference frames in the same manner as described above.
For example, if the current frame to be decoded is a P frame, the K prediction reference frames include forward frames of the current frame to be decoded. At this time, the decoding end can determine at least one predicted node of the current node in the forward frame based on the above steps. Illustratively, as shown in fig. 15A, it is assumed that the current node includes 3 domain nodes, respectively denoted as node 11, node 12 (being the current node), and node 13, and the 3 domain nodes correspond to one corresponding node in the forward frame, respectively denoted as node 21, node 22, and node 23, and further, the node 21, node 22, and node 23 are determined as 3 predicted nodes of the current node in the forward frame, or 1 or 2 nodes are selected from the node 21, node 22, and node 23 and are determined as 1 or 2 predicted nodes of the current node in the forward frame.
For another example, if the current frame to be decoded is a B frame, the K prediction reference frames include a forward frame and a backward frame of the current frame to be decoded. At this time, the decoding end may determine at least one predicted node of the current node in the forward frame and at least one predicted node of the current node in the backward frame based on the above steps. Illustratively, as shown in fig. 15B, it is assumed that the current node includes 3 domain nodes, respectively designated as node 11, node 12, and node 13, one corresponding node of the 3 domain nodes in the forward frame is designated as node 21, node 22, and node 23, and one corresponding node of the 3 domain nodes in the backward frame is designated as node 41, node 42, and node 43. The decoding end may thus determine the nodes 21, 22 and 23 as 3 predicted nodes of the current node in the forward frame, or 1 or 2 selected from the nodes 21, 22 and 23 as 1 or 2 predicted nodes of the current node in the forward frame. Similarly, the decoding end may determine the node 41, the node 42 and the node 43 as 3 predicted nodes of the current node in the backward frame, or select 1 or 2 nodes from the node 41, the node 42 and the node 43 as 1 or 2 predicted nodes of the current node in the backward frame.
After determining at least one predicted node of each of the K predicted reference frames, the decoding end executes the step S101-B, that is, determines N predicted nodes of the current node based on the at least one predicted node of the current node in the K predicted reference frames.
In one example, at least one predicted node of the current node in the K predicted reference frames is determined as N predicted nodes of the current node.
For example, k=2, i.e. K prediction reference frames comprise a first prediction reference frame and a second prediction reference frame. Assuming that the current node has 2 predicted nodes in the first predicted reference frame and the current node has 3 predicted nodes in the second predicted reference frame, it can be determined that the current node has 5 predicted nodes, where n=5.
In another example, N predicted nodes of the current node are filtered out from at least one predicted node of the K predicted reference frames.
With continued reference to the above example, assume that k=2, i.e., K predicted reference frames include a first predicted reference frame and a second predicted reference frame. Assume that the current node has 2 predicted nodes in the first predicted reference frame and that the current node has 3 predicted nodes in the second predicted reference frame. From these 5 predicted nodes, 3 predicted nodes are selected as the final predicted node of the current node. For example, from among the 5 prediction nodes, 3 prediction nodes having the smallest difference between the occupancy information and the occupancy information of the current node are selected, and the final prediction node of the current node is determined.
In the second mode, after determining M field nodes of the current node in the current frame to be decoded, the decoding end determines a corresponding node of each field node in the M field nodes in a kth prediction reference frame, and further determines at least one prediction point of the current node in the kth prediction reference frame based on the corresponding node of each field node in the M field nodes.
In a third mode, the step of determining at least one prediction node of the current node in the kth prediction reference frame in S101-A1 includes the following steps of:
S101-B11, determining a corresponding node of the current node in a kth prediction reference frame;
S101-B12, determining at least one domain node of the corresponding node;
S101-B13, determining at least one field node as at least one prediction node of the current node in a kth prediction reference frame.
In this mode 3, for each of the K prediction reference frames, the decoding end first determines a corresponding node of the current node among the prediction reference frames. For example, a corresponding node 1 of the current node in the prediction reference frame 1 is determined, and a corresponding node 2 of the current node in the prediction reference frame 2 is determined. Then, the decoding end determines at least one domain node of each corresponding node. For example, at least one domain node of the corresponding node 1 is determined in the prediction reference frame 1, and at least one domain node of the corresponding node 2 is determined in the prediction reference frame 2. In this way, at least one domain node of the corresponding node 1 in the prediction reference frame 1 can be determined as at least one prediction node of the current node in the prediction reference frame 1, and at least one domain node of the corresponding node 2 in the prediction reference frame 2 can be determined as at least one prediction node of the current node in the prediction reference frame 2.
The process of determining the corresponding node of the ith domain node in the kth prediction reference frame in S101-a12 in the second mode is basically the same as the process of determining the corresponding node of the current node in the kth prediction reference frame in S101-B11 in the third mode. For convenience of description, the above-mentioned i-th domain node and the current node are denoted as i-th node, and a specific procedure for determining a corresponding node of the i-th node in the kth prediction reference frame is described below.
The decoding end determines that the corresponding node of the ith node in the kth predicted reference frame at least comprises the following modes:
In mode 1, one node of the kth prediction reference frame, which has the same partition depth as the ith node, is determined as the corresponding node of the ith node.
By way of example, it is assumed that the ith node is located at level 3 of the octree of the current frame to be decoded, so that each node located at level 3 of the octree in the kth prediction reference frame can be obtained, and further, from these nodes, the corresponding node of the ith node is determined. For example, among the points where the kth prediction reference frame and the ith node are at the same division depth, a node with the smallest difference between the occupation information of the ith node and the occupation information of the ith node is selected, and the node is determined as the corresponding node of the ith node in the kth prediction reference frame.
Mode 2, the above-mentioned S101-a12 and S101-B11 include the following steps:
S101-A121, determining a father node of an ith node as the ith father node in a current frame to be decoded;
S101-A122, determining a matching node of the ith parent node in the kth predicted reference frame as the ith matching node;
S101-A123, determining one of the sub-nodes of the i matched nodes as a corresponding node of the i node in the kth prediction reference frame.
In this mode 2, for the ith node, the decoding end determines, in the current frame to be decoded, the parent node of the ith node, and further determines, in the kth prediction reference frame, the matching node of the parent node of the ith prediction domain node. For convenience of description, the parent node of the ith node is denoted as the ith parent node, and the matching node of the parent node of the ith node in the kth prediction reference frame is determined as the ith matching node. And then, determining one child node of the child nodes of the ith matching node as a corresponding node of the ith node in the kth predicted reference frame, and accurately determining the corresponding node of the ith node in the kth predicted reference frame.
The following describes a specific procedure for determining a matching node of the ith parent node in the kth predicted reference frame in S101-a 122.
The embodiment of the application does not limit the specific mode of determining the matching node of the ith father node in the kth prediction reference frame by the decoding end.
In some embodiments, the depth of the division of the i-th parent node in the current frame to be decoded is determined, e.g., the i-th parent node is at layer 2 of the octree of the current frame to be decoded. Thus, the decoding end can determine one node in the nodes with the same division depth as the i parent node in the k predictive reference frame as the matching node of the i parent node in the k predictive reference frame. For example, one of the nodes in the kth predicted reference frame at layer 2 is determined as a matching node of the ith parent node in the kth predicted reference frame.
In some embodiments, the decoding end determines a matching node of the ith parent node in the kth predicted reference frame based on the occupancy information of the ith parent node. Specifically, since the occupancy information of the ith parent node in the current frame to be decoded is already decoded, and the occupancy information of each node in the kth prediction reference frame is also already decoded. Thus, the decoding end can search the matching node of the ith father node in the kth prediction reference frame based on the occupation information of the ith father node.
For example, a node having the smallest difference between the occupancy information of the kth prediction reference frame and the occupancy information of the ith parent node is determined as a matching node of the ith parent node in the kth prediction reference frame.
For example, assuming that the occupancy information of the ith parent node is 11001101, a node having the smallest difference between the occupancy information and the occupancy information 11001101 is queried in the kth prediction reference frame. Specifically, the decoding end performs exclusive or operation on the occupation information of the ith father node and the occupation information of each node in the kth prediction reference frame, and determines the node with the smallest exclusive or operation result in the kth prediction reference frame as the matching node of the ith father node in the kth prediction reference frame.
For example, assuming that the occupancy information of the node 1 in the kth prediction reference frame is 10001101, performing an exclusive-or operation on 11001101 and 10001101, where the 1 st bit of 11001101 and the 1 st bit of 10001101 are both 1, so that the exclusive-or operation result of the 1 st bit of both is 0,11001101 and the 2 nd bit of 10001111 are different, so that the exclusive-or operation result of the 2 nd bit of both is 1, and so on, to obtain the exclusive-or operation results of 11001101 and 10001111 as 0+1+0+0+0+0+1=2. With reference to the mode, the decoding end can determine the exclusive or operation result of the occupation information of the ith father node and the occupation information of each node in the kth prediction reference frame, and further determine the node with the minimum exclusive or operation with the occupation information of the ith father node in the kth prediction reference frame as the matching node of the ith father node in the kth prediction reference frame.
Based on the steps, the decoding end can determine the matching node of the ith father node in the kth predicted reference frame. For ease of description, this matching node is denoted as the i-th matching node.
Then, the decoding end determines one of the sub-nodes of the ith matching node as a corresponding node of the ith field node in the kth prediction reference frame.
For example, the decoding end determines a default child node among the child nodes included in the ith matching node as a corresponding node of the ith node in the kth predicted reference frame. Assume that the 1 st child node of the i-th matching node is determined as the corresponding node of the i-th node in the k-th prediction reference frame.
For another example, the decoding end determines a first sequence number of the ith node in the child nodes included in the father node, and determines the child node with the first sequence number in the child nodes of the ith matching node as the corresponding node of the ith node in the kth predicted reference frame. Illustratively, as shown in fig. 14, the ith node is the 2 nd child of the ith parent node, where the first sequence number is 2. Thus, the 2 nd child node of the i-th matching node can be determined as the corresponding node of the i-th node.
The above description is directed to a determination process of determining an i-th domain node among M domain nodes, a corresponding node in a kth prediction reference frame, and a corresponding node of a current node in the kth prediction reference frame. In this way, the decoding end can determine N prediction nodes of the current node in the prediction reference frame in a mode two or a mode three.
Based on the above steps, the decoding end determines N predicted nodes of the current node in the predicted reference frame of the current frame to be decoded, and then performs the following step S102.
S102, based on the geometric decoding information of the points in the N prediction nodes, performing prediction decoding on the position information of the points in the current node.
Because of the correlation between adjacent frames of the point cloud, the embodiment of the application refers to the correlation information between frames when the position information of the point in the current node is predictive coded based on the correlation between the adjacent frames of the point cloud. Specifically, based on geometric decoding information of points in N predicted nodes of a current node, the WeChat information of the points in the current node is predicted and encoded, so that the encoding and decoding efficiency and encoding and decoding performance of the point cloud are improved.
In one example, as shown in FIG. 16A, the process of directly encoding the current node by the encoding end includes determining whether the current node is directly encoding qualified, and if it is determined that the current node is directly decoding qualified, setting IDCMEligible to true. And then judging whether the number of points included in the current node is smaller than a preset threshold value, if so, determining to code the current node in a direct coding mode, namely directly coding the number of points of the current node and the geometric information of the points in the current node.
Correspondingly, when the decoding end decodes the current node, as shown in fig. 16B, the decoding end first determines whether the current node has the qualification of direct decoding, and if the decoding end determines that the current node has the qualification of direct decoding, then sets IDCMEligible to true. Next, the geometric information of the point in the current node is decoded.
In the embodiment of the present application, based on the geometric decoding information of N prediction nodes, the coordinate information of the point in the current node is predicted and decoded, which can be understood as using the geometric decoding information of N prediction nodes as context, and the coordinate information of the point in the current node is predicted and decoded. For example, the decoding end determines the index of the context model based on the geometric decoding information of the N prediction nodes, further determines the target context model from a plurality of preset context models based on the index of the context model, and performs prediction decoding on the coordinate information of the point in the current node by using the context model.
In the embodiment of the present application, based on the geometric decoding information of N prediction nodes, the process of performing prediction decoding on the coordinate information of each point in the current node is basically the same, and for convenience of description, the description will be given here taking the prediction decoding on the coordinate information of the current point in the current node as an example.
In some embodiments, the step S102 includes the following steps:
S102-A, determining an index of a context model based on geometric decoding information of N prediction nodes;
S102-B, determining a context model based on the index of the context model;
S102-C, predictive decoding is carried out on the coordinate information of the current point in the current node by using a context model.
In the embodiment of the application, a plurality of context models, such as Q context models, are set for the decoding process of the coordinate information, and the embodiment of the application does not limit the specific number of the context models corresponding to the coordinate information, as long as Q is ensured to be greater than 1. That is, in the embodiment of the present application, at least one optimal context model is selected from the 2 context models, and the coordinate information of the current point in the current node is predictively decoded, so as to improve the decoding efficiency of the coordinate information of the current point.
Illustratively, the coordinate information corresponds to a plurality of context models as shown in Table 4:
| Indexing of | Context model |
| 0 | Context model A |
| 1 | Context model B |
| ...... | ...... |
In this way, the decoding end determines the index of the context model based on the geometric decoding information of the N prediction nodes. Next, based on the index of the context model, one context model is selected from the context models corresponding to table 4 to perform predictive decoding on the coordinate information of the current point in the current node.
The geometric decoding information of the prediction node in the embodiment of the application can be understood as any information involved in the geometric decoding process of the prediction node. For example, the number of points included in the prediction node, the occupancy information of the prediction node, the decoding mode of the prediction node, the geometric information of the points in the prediction node, and the like are included.
In some embodiments, the geometric decoding information of the prediction node includes direct decoding information of the prediction node and/or coordinate information of a point in the prediction node, where the direct decoding information of the prediction node is used to indicate whether the prediction node satisfies a condition for decoding in a direct decoding manner.
Based on this, the step S102-A includes the following step S102-A1:
S102-A1, determining a first context index based on direct decoding information of N prediction nodes, and/or determining a second context index based on coordinate information of points of the N prediction nodes.
Correspondingly, the step S102-B includes the following steps S102-B:
S102-B1, selecting a context model from a plurality of preset context models based on the first context index and/or the second context index.
In this embodiment, if the geometric decoding information of the prediction node includes direct decoding information of the prediction node and/or coordinate information of points in the prediction node, the decoding end may determine the first context index based on the direct decoding information of the N prediction nodes, and/or determine the second context index based on the position information of points in the N prediction nodes, and further select the final context Wen Mo type from the preset multiple context models based on the first context index and/or the second context index.
It can be seen that in this embodiment, the manner in which the decoding end determines the context model includes, but is not limited to, the following ways:
In one possible implementation manner, if the geometric decoding information of the prediction node includes direct decoding information of the prediction node, the process of determining the context model may be to determine a first context index based on the direct decoding information of the N prediction nodes, and further select a final context model from a plurality of preset context models based on the first context index to decode the coordinate information of the current point.
For example, the decoding end selects a final context model from the context models shown in table 4 based on the first context index.
In another possible implementation manner, if the geometric decoding information of the prediction node includes coordinate information of points in the prediction node, the process of determining the context model may be to determine a second context index based on the coordinate information of points in the N prediction nodes, and further select a final context model from a plurality of preset context models based on the second context index to decode the coordinate information of the current point.
For example, the decoding end selects a final context model from the context models shown in table 4 based on the second context index.
In another possible implementation manner, if the geometric decoding information of the prediction node includes direct decoding information of the prediction node and coordinate information of points in the prediction node, the process of determining the context model may be to determine a first context index based on the direct decoding information of N prediction nodes, further determine a second context index based on the first context index and coordinate information of points in N prediction nodes, further determine a second context index based on the second context index, further select a final context model from a plurality of preset context models based on the first context index and the second context index, and decode the coordinate information of the current point.
Illustratively, the correspondence of the first context index, the second context index, and the context model is shown in table 5:
TABLE 5
| Second context index 1 | Second context index 2 | Second context index 3 | ...... | |
| First context index 1 | Context model 11 | Context model 12 | Context model 13 | ...... |
| First context index 2 | Context model 21 | Context model 22 | Context model 23 | ...... |
| First context index 3 | Context model 31 | Context model 32 | Context model 33 | ...... |
| ...... | ...... | ...... | ...... | ...... |
In this mode 3, the decoding end determines the first context index based on the direct decoding information of the N prediction nodes, and searches the above table 5 to obtain the final context model after determining the second context index based on the coordinate information of the points in the N prediction nodes. For example, the decoding end determines that the first context index is the first context index 2 based on the direct decoding information of the N prediction nodes, determines that the second context index is the second context index 3 based on the coordinate information of the points in the N prediction nodes, so that the final context model can be obtained by looking up the table 5 as the context model 23, and the decoding end decodes the coordinate information of the current point by using the context model 23.
The specific procedure for determining the first context index based on the direct decoding information of the N prediction nodes in S102-A1 is described below.
In the embodiment of the present application, the manner of determining the first context index by the decoding end includes, but is not limited to, the following:
In one mode, the step S102-A1 includes the following steps S102-A1-11 and S102-A1-12:
S102-A1-11, determining a first numerical value corresponding to any one of N prediction nodes based on direct decoding information of the prediction nodes.
In this aspect, for each of the N prediction nodes, a first value corresponding to the prediction node is determined based on the direct decoding information of the prediction node, and finally, a first context index is determined based on the first values corresponding to the N prediction nodes.
The process of determining the first value corresponding to the predicted node is described below.
As can be seen from the above, the direct decoding information of the prediction node is used to indicate whether the prediction node satisfies the condition for decoding by the direct decoding method. The embodiment of the application does not limit the specific content of the direct decoding information.
In some embodiments, the direct decoding information includes points included by the prediction node. In this way, the first value corresponding to the predicted node may be determined based on the number of points included in the predicted node.
In one example, under the GPCC framework, when the number of points included in the predicted node is greater than or equal to 2, the first value corresponding to the predicted node is determined to be 1, and if the number of points included in the predicted node is less than 2, the first value corresponding to the predicted node is determined to be 0. Under the AVS framework, when the number of points included in the prediction node is greater than or equal to 1, determining that the first numerical value corresponding to the prediction node is 1, and when the number of points included in the prediction node is less than 1, determining that the first numerical value corresponding to the prediction node is 0.
In another example, the number of points included in the predicted node is determined to be a first value corresponding to the predicted node, for example, when the predicted node includes 2 points, the first value corresponding to the predicted node is determined to be 2.
In some embodiments, the direct decoding information of the prediction node includes a direct decoding mode of the prediction node, and the step S102-A1-11 includes determining a number of the direct decoding mode of the prediction node and a first value corresponding to the prediction node.
For example, under the GPCC framework, if the direct decoding mode of the prediction node is mode 0, it is determined that the first value corresponding to the prediction node is 0. If the direct decoding mode of the prediction node is mode 1, determining that the first value corresponding to the prediction node is 1. If the direct decoding mode of the prediction node is mode 2, determining that the first value corresponding to the prediction node is 2.
For another example, in the AVS framework, if the direct decoding mode of the prediction node is mode 0, the first value corresponding to the prediction node is determined to be 0. If the direct decoding mode of the prediction node is mode 1, determining that the first value corresponding to the prediction node is 1.
Based on the above steps, the decoding end determines a first value corresponding to each of the N prediction nodes, and then executes the following steps S102-A1-12.
S102-A1-12, determining a first context index based on first numerical values corresponding to the N prediction nodes.
The decoding end determines the first context index based on the first values corresponding to the N prediction nodes after determining the first values corresponding to the N prediction nodes based on the steps.
Wherein, based on the first values corresponding to the N prediction nodes, determining the first context index at least includes the following implementation manners:
in mode 1, an average value of a sum of first values corresponding to N prediction nodes is determined as a first context index.
Mode 2, S102-A1-12 includes the steps of S102-A1-121 to S102-A1-123:
S102-A1-121, determining a first weight corresponding to a prediction node;
S102-A1-122, carrying out weighting processing on first numerical values corresponding to the N prediction nodes based on the first weight to obtain first weighted prediction values;
S102-A1-123, determining a first context index based on the first weighted prediction value.
In this mode 2, if the current node includes a plurality of prediction nodes, that is, N prediction nodes, when determining the first context index based on the first values corresponding to the N prediction nodes, a weight, that is, a first weight, may be determined for each of the N prediction nodes, so that the first values corresponding to the prediction nodes may be weighted based on the first weights of the prediction nodes, and then the first context index may be determined according to the final weighted result, thereby improving accuracy of determining the first context index based on the geometric decoding information of the N prediction nodes.
The embodiment of the application does not limit the first weights corresponding to the N prediction nodes.
In some embodiments, the first weight corresponding to each of the N prediction nodes is a preset value. As can be seen from the above, the N predicted nodes are determined based on M domain nodes of the current node, assuming that the predicted node 1 is a predicted node corresponding to the domain node 1, if the domain node 1 is a coplanar node of the current node, the first weight of the predicted node 1 is a preset weight 1, if the domain node 1 is a collinear node of the current node, the first weight of the predicted node 1 is a preset weight 2, and if the domain node 1 is a co-node of the current node, the first weight of the predicted node 1 is a preset weight 3.
In some embodiments, for each of the N predicted nodes, a first weight corresponding to the predicted node is determined based on a distance between the domain node corresponding to the predicted node and the current node. For example, the smaller the distance between the domain node and the current node, the stronger the inter-frame correlation between the predicted node corresponding to the domain node and the current node, and further the first weight of the predicted node is larger.
For example, taking the prediction node 1 of the N prediction nodes as an example, assume that the prediction node 1 is a corresponding point of the domain node 1 in the prediction reference frame among the M domain nodes of the current node, so that the first weight of the prediction node 1 can be determined based on the distance between the domain node 1 and the current node. For example, the inverse of the distance between the domain node 1 and the current node is determined as the first weight of the predicted node 1.
In one example, if the domain node 1 is a coplanar node of the current node, the first weight of the predicted node 1 is 1, and if the domain node 1 is a collinear node of the current node, the first weight of the predicted node 1 is a preset weightIf the domain node 1 is the current node, the first weight of the predicted node 1 is a preset weight
In one example, if domain node 1 is a coplanar node of the current node, then the first weight of predicted node 1 isIf the domain node 1 is a collinear node of the current node, predicting that the first weight of the node 1 is a preset weightIf the domain node 1 is the current node, the first weight of the predicted node 1 is a preset weight
In some embodiments, based on the above steps, after determining the weight corresponding to each of the N prediction nodes, normalizing the weight, and making the normalized weight into the final first weight of the prediction node.
The embodiment of the application carries out weighting processing on the first numerical values corresponding to the N prediction nodes based on the first weight, and the specific mode for obtaining the first weighted prediction value is not limited.
In one example, based on the first weight, a first numerical value corresponding to the N prediction nodes is weighted averaged to obtain a first weighted prediction value.
In another example, based on the first weights, first values corresponding to the N prediction nodes are weighted and summed to obtain a first weighted prediction value.
Based on the mode step, after determining the first weighted prediction value, determining the first context index based on the first weighted prediction value, that is, S102-A1-123 includes at least the following examples:
Example 1, a first weighted prediction value is determined as a first context index.
Example 2, a weighted prediction value range in which the first weighted prediction value is located is determined, and an index corresponding to the range is determined as the first context index.
The process of weighting N prediction nodes by the decoding end to obtain the first context index is introduced.
In some embodiments, the decoding end may also determine the first context index in the following manner.
And if K is greater than 1, determining a second weighted prediction value corresponding to each of the K predicted reference frames, and further determining the first context index based on the second weighted prediction values corresponding to the K predicted reference frames respectively. At this time, the above-mentioned S102-A1 includes the steps of S102-A1-21 to S102-A1-23 as follows:
S102-A1-21, aiming at a j-th prediction reference frame in K prediction reference frames, determining a first numerical value corresponding to a prediction node in the j-th prediction reference frame based on direct decoding information of the current node in the prediction node in the j-th prediction reference frame, wherein j is a positive integer less than or equal to K;
S102-A1-22, determining a first weight corresponding to a predicted node, and carrying out weighting processing on a first numerical value corresponding to the predicted node in a j-th predicted reference frame based on the first weight to obtain a second weighted predicted value corresponding to the j-th predicted reference frame;
S102-A1-23, determining a first context index based on second weighted prediction values corresponding to the K prediction reference frames.
In this second mode, each of the K prediction reference frames is considered as separate context information when determining the first context index. Specifically, direct decoding information of a prediction node included in each of the K prediction reference frames is determined, a second weighted prediction value corresponding to each prediction reference frame is determined, and then a first context index is determined based on the second weighted prediction value corresponding to each prediction reference frame, so that accurate selection of the first context index is realized, and further decoding efficiency of the point cloud is improved.
In the embodiment of the present application, the specific manner of determining the second weighted prediction value corresponding to each of the K prediction reference frames by the decoding end is the same, and for convenience of description, the j-th prediction reference frame of the K prediction reference frames is taken as an example for illustration.
In an embodiment of the application, the current node comprises at least one prediction node in a jth prediction reference frame, such that a first value of the at least one prediction node is determined based on direct decoding information of the at least one prediction node in the jth prediction reference frame.
For example, the jth prediction reference frame includes 2 prediction nodes of the current node, which are respectively denoted as prediction node 1 and prediction node 2, and further determines the first value of the prediction node 1 based on the direct decoding information of the prediction node 1, and determines the first value of the prediction node 2 based on the direct decoding information of the prediction node 2. The process of determining the first value corresponding to the prediction node based on the direct decoding information of the prediction node may refer to the description of the above embodiment, and the exemplary direct decoding mode based on the prediction node determines the first value corresponding to the prediction node, for example, the number (0, 1 or 2) of the direct decoding mode of the prediction node is determined as the first value corresponding to the prediction node.
After determining a first numerical value of at least one prediction node included in the jth prediction reference frame, the decoding end determines first weights respectively corresponding to the at least one prediction node, and performs weighting processing on the first numerical value corresponding to the at least one prediction node based on the first weights to obtain a second weighted prediction value corresponding to the jth prediction reference frame.
In one example, based on the first weight, a first value corresponding to a prediction node in a jth prediction reference frame is weighted averaged to obtain a second weighted prediction value corresponding to the jth prediction reference frame.
In another example, based on the first weight, a first value corresponding to a prediction node in the jth predicted reference frame is weighted and summed to obtain a second weighted prediction value corresponding to the jth predicted reference frame.
The determining process of the first weight may refer to the description of the foregoing embodiments, which is not repeated herein.
The above description is directed to a process of determining the second weighted prediction value corresponding to the jth prediction reference frame in the K prediction reference frames, where the second weighted prediction value corresponding to the other prediction reference frames in the K prediction reference frames is determined by referring to the mode corresponding to the jth prediction reference frame.
After determining the second weighted prediction value corresponding to each of the K prediction reference frames, the decoding end executes the step S102-A1-23.
The specific mode of determining the first context index based on the second weighted prediction values corresponding to the K prediction reference frames is not limited.
In some embodiments, the decoding end determines an average value of the second weighted prediction values corresponding to the K prediction reference frames as the first context index.
In some embodiments, the decoding end determines second weights corresponding to the K prediction reference frames, and performs weighting processing on second weighted prediction values corresponding to the K prediction reference frames based on the second weights, so as to obtain the first context index.
In this embodiment, the decoding end first determines a second weight corresponding to each of the K prediction reference frames. The embodiment of the application does not limit the determination of the second weight corresponding to each prediction reference frame in the K prediction reference frames.
In some embodiments, the second weight corresponding to each of the K predicted reference frames is a preset value. As can be seen from the above, the K predicted reference frames are forward frames and/or backward frames of the current frame to be decoded. Assuming that the predicted reference frame 1 is a forward frame of the current frame to be decoded, the second weight corresponding to the predicted reference frame 1 is a preset weight 1, and if the predicted reference frame 1 is a backward frame of the current frame to be decoded, the second weight corresponding to the predicted reference frame 1 is a preset weight 2.
In some embodiments, a second weight corresponding to the predicted reference frame is determined based on a temporal gap of the predicted reference frame from the current frame to be decoded. In the embodiment of the application, each point cloud includes time information, and the time information can be the time when the point cloud acquisition equipment acquires the frame point cloud. Based on this, if the time difference between the predicted reference frame and the current frame to be decoded is smaller, the inter-frame correlation between the predicted reference frame and the current frame to be decoded is stronger, and thus the second weight corresponding to the predicted reference frame is larger. For example, the inverse of the temporal difference between the predicted reference frame and the current frame to be decoded may be determined as the second weight corresponding to the predicted reference frame.
And after determining the second weight corresponding to each of the K prediction reference frames, weighting the second weighted predicted values corresponding to the K prediction reference frames respectively based on the second weights to obtain the first context index.
For example, let k=2, for example, the current frame to be decoded includes 2 prediction reference frames, where the 2 prediction reference frames include a forward frame and a backward frame of the current frame to be decoded, and let the second weight corresponding to the forward frame be W1 and the second weight corresponding to the backward frame be W2, so that the second weighted prediction value corresponding to the forward frame and the second weighted prediction value corresponding to the backward frame are weighted based on W1 and W2, to obtain the first context index.
In one example, based on the second weights, second weighted prediction values corresponding to the K prediction reference frames are weighted averaged to obtain the first context index.
In another example, based on the second weights, second weighted prediction values corresponding to the K prediction reference frames are weighted and summed to obtain the first context index.
The above describes the procedure for determining the first context index at the decoding end.
The procedure for determining the second context index at the decoding end is described below.
As can be seen from the above process of determining the prediction nodes, each of the N prediction nodes includes one point or a plurality of points, and if each of the N prediction nodes includes one point, the second context index is determined using the one point included in each of the N prediction nodes.
In some embodiments, if a plurality of points are included in the prediction node, a point is selected from the plurality of points to determine the second context index. At this time, the above-mentioned S102-A1 includes the steps of S102-A1-31 and S102-A1-32 as follows:
S102-A1-31, selecting a first point corresponding to a current point of a current node from points included in any one of N prediction nodes;
S102-A1-32, determining a second context index based on the coordinate information of the first point included in the N prediction nodes.
For example, assuming that N prediction nodes include a prediction node 1 and a prediction node 2, where the prediction node 1 includes a point 1 and a point 2, and the prediction node 2 includes a point 3, a point 4, and a point 5, one point is selected from the points 1 and 2 included in the prediction node 1 as a first point, and one point is selected from the points 3,4, and 5 included in the prediction node 2 as a first point. This allows determining the geometrical information of the current point based on the geometrical information of the first point in the prediction node 1 and the first point in the prediction node 2.
The embodiment of the application does not limit the specific mode of selecting the first point corresponding to the current point of the current node from the points included in the predicted node.
In one possible implementation, a point in the predicted node that coincides with the order of the current point is determined as the first point corresponding to the current point. For example, assuming that the current point is the 2 nd point in the current node, the point 2 in the prediction node 1 may be determined as the first point corresponding to the current point, and the point 4 in the prediction node 2 may be determined as the first point corresponding to the current point. For another example, if the predicted node includes only one point, the point included in the predicted node is determined as the first point corresponding to the current point.
In one possible implementation manner, if the encoding end selects a first point corresponding to the current point from the points included in the prediction node based on the rate distortion cost (or the approximate cost), the encoding end writes the identification information of the first point in the prediction node into the code stream, so that the decoding end obtains the first point in the prediction node by decoding the code stream.
The decoding end determines, for each of the N prediction nodes, a first point corresponding to the current point in each prediction node based on the above method, and further executes the step S102-A1-32 described above.
In the embodiment of the present application, the decoding end encodes coordinate information of the current point on different coordinate axes, respectively, and based on the coordinate information, the step S102-A1-32 includes the following steps of S102-A1-321:
S102-A1-321, determining a second context index corresponding to the ith coordinate axis based on the coordinate information of the first point on the ith coordinate axis, which is included in the N prediction nodes.
The ith coordinate axis may be an X axis, a Y axis, or a Z axis, which is not limited in this embodiment of the present application.
In some embodiments, if the point cloud is a laser radar point cloud, it can be known that the i-th coordinate axis is an X-axis or a Y-axis.
In some embodiments, if the point cloud is an eye-oriented point cloud, the ith coordinate axis may be any one of the X-axis, the Y-axis, or the Z-axis.
In the embodiment of the application, when the decoding end decodes the coordinate information of the current point on the ith coordinate axis, the second context index corresponding to the ith coordinate axis is determined based on the coordinate information of the first point on the ith coordinate axis, which is included in the N prediction nodes. Therefore, based on the first context index and/or the second context index corresponding to the ith coordinate axis, a context model corresponding to the ith coordinate axis is selected from a plurality of context models, and the context model corresponding to the ith coordinate axis is further used for predictive decoding of the coordinate information of the current point on the ith coordinate axis. For example, the decoding end determines a second context index corresponding to the X axis based on coordinate information of the first point on the X axis included in the N prediction nodes, selects a context model corresponding to the X axis from the multiple context models based on the first context index and/or the second context index corresponding to the X axis, and further performs predictive decoding on coordinate information of the current point on the X axis by using the context model corresponding to the X axis to obtain an X coordinate value of the current point. For another example, the decoding end determines a second context index corresponding to the Y axis based on the coordinate information of the first point on the Y axis included in the N prediction nodes, and selects a context model corresponding to the Y axis from the multiple context models based on the first context index and/or the second context index corresponding to the Y axis, and further performs predictive decoding on the coordinate information of the current point on the Y axis by using the context model corresponding to the Y axis, so as to obtain the Y coordinate value of the current point.
The following describes a process that the decoding end determines the second context index corresponding to the ith coordinate axis based on the coordinate information of the first point on the ith coordinate axis included in the N prediction nodes.
In the embodiment of the present application, the implementation manners of the above S102-A1-321 include, but are not limited to, the following:
In one aspect, first points included in the N prediction nodes are weighted, and a second context index corresponding to the i-th coordinate axis is determined based on the weighted coordinate information. At this time, the above-mentioned S102-A1-321 includes the steps of S102-A1-321-11 to S102-A1-321-13 as follows:
S102-A1-321-11, determining a first weight corresponding to a prediction node;
S102-A1-321-12, carrying out weighting processing on coordinate information of first points included in N prediction nodes based on first weights to obtain first weighted points;
S102-A1-321-13, determining a second context index corresponding to the ith coordinate axis based on the coordinate information of the first weighting point on the ith coordinate axis.
In this aspect, if the current node includes a plurality of prediction nodes, that is, N prediction nodes, when determining the second context index corresponding to the i-th coordinate axis based on the coordinate information of the first point included in the N prediction nodes, a weight, that is, the first weight, may be determined for each of the N prediction nodes. In this way, the coordinate information of the first point included in each prediction node can be weighted based on the first weight of each prediction node to obtain the first weighted point, and then the second context index corresponding to the ith coordinate axis is determined according to the coordinate information of the first weighted point on the ith coordinate axis, so that the accuracy of decoding the current point based on the geometric decoding information of N prediction nodes is improved.
The process of determining the first weights corresponding to the N prediction nodes in the embodiment of the present application may refer to the description of the foregoing embodiment, which is not repeated herein.
After determining a first weight corresponding to each of the N prediction nodes, the decoding end performs weighting processing on coordinate information of a first point included in the N prediction nodes based on the first weight to obtain a first weighting point.
The embodiment of the application carries out weighting processing on the coordinate information of the first points included in the N prediction nodes based on the first weight, and the specific mode for obtaining the first weighted points is not limited.
In one example, based on the first weight, the coordinate information of the first points included in the N prediction nodes is weighted and averaged to obtain a first weighted point.
And determining a second context index corresponding to the ith coordinate axis based on the coordinate information of the first weighting point on the ith coordinate axis after determining the first weighting point based on the mode step.
As can be seen from the above, the first weighting point is obtained by weighting a first point of the N prediction nodes, where the value of each bit of the first point after the prediction node is only 2 results, which is 0 or 1. Thus, in some embodiments, the value of each bit of the first weighting point obtained by weighting the first point in the N prediction nodes is also 0 or 1. In this way, when decoding the ith bit of the current point on the ith coordinate axis, the second context index corresponding to the ith bit on the ith coordinate axis is determined based on the value of the ith bit of the first weighting point on the ith coordinate axis. For example, if the value of the ith bit of the first weighting point on the ith coordinate axis is 0, the second context index corresponding to the ith bit on the ith coordinate axis is determined to be 0. For another example, if the value of the ith bit of the first weighting point on the ith coordinate axis is 1, the second context index corresponding to the ith bit on the ith coordinate axis is determined to be 1. And finally, the decoding end determines a context model corresponding to the ith bit on the ith coordinate axis based on the first context index and/or the second context index corresponding to the ith bit on the ith coordinate axis, and performs predictive decoding on the value of the ith bit on the ith coordinate axis of the current point by using the context model.
The decoding end may determine the second context index by the following manner, in addition to determining the second context index based on the above manner one.
In the second mode, if K is greater than 1, the first point included in the prediction node in each of the K prediction reference frames is weighted, and the second context index corresponding to the i-th coordinate axis is determined based on the weighted coordinate information. At this time, the step S102-B includes the following steps S102-B21 to S102-B23:
S102-A1-321-21, determining a first weight corresponding to a prediction node in a j-th prediction reference frame aiming at the j-th prediction reference frame in K prediction reference frames;
S102-A1-321-22, carrying out weighting processing on coordinate information of a first point included in a prediction node in a jth prediction reference frame based on the first weight to obtain a second weighting point corresponding to the jth prediction reference frame, wherein j is a positive integer less than or equal to K;
S102-A1-321-23, determining a second context index corresponding to the ith coordinate axis based on second weighting points corresponding to the K prediction reference frames.
In the second mode, each of the K prediction reference frames is considered separately when determining the geometric information of the current point. Specifically, the coordinate information of a first point in a prediction node of each prediction reference frame in the K prediction reference frames is determined, a second weighting point corresponding to each prediction reference frame is determined, and then a second context index corresponding to an ith coordinate axis is determined based on the coordinate information of the second weighting point corresponding to each prediction reference frame, so that accurate prediction of the second context index is realized, and further the decoding efficiency of point cloud is improved.
In the embodiment of the present application, the specific manner of determining the second weighting point corresponding to each of the K prediction reference frames by the decoding end is the same, and for convenience of description, the j-th prediction reference frame of the K prediction reference frames is taken as an example for illustration.
In the embodiment of the present application, the current node includes at least one prediction node in the jth prediction reference frame, so that a second weighting point corresponding to the jth prediction reference frame is determined based on the coordinate information of a first point included in the at least one prediction node in the jth prediction reference frame.
For example, the jth prediction reference frame includes 2 prediction nodes of the current node, which are respectively denoted as a prediction node 1 and a prediction node 2, and further performs weighting processing on the geometric information of the first point included in the prediction node 1 and the coordinate information of the first point included in the prediction node 2 to obtain a second weighting point corresponding to the jth prediction reference frame.
Before the decoding end performs weighting processing on the geometric information of the first point included in the prediction node in the jth prediction reference frame, first determining a first weight corresponding to each prediction node in the jth prediction reference frame. The determining process of the first weight may refer to the description of the foregoing embodiments, which is not repeated herein.
And then, the decoding end carries out weighting processing on the coordinate information of the first point included in the prediction node in the j-th prediction reference frame based on the first weight to obtain a second weighting point corresponding to the j-th prediction reference frame.
In one example, based on the first weight, the coordinate information of the first point included in the prediction node in the jth prediction reference frame is weighted and averaged to obtain a second weighted point corresponding to the jth prediction reference frame.
The above description of the process of determining the second weighting point corresponding to the jth predicted reference frame in the K predicted reference frames may be performed by referring to the manner corresponding to the jth predicted reference frame.
After determining the second weighting point corresponding to each of the K predicted reference frames, the decoding end executes the step S102-A1-321-23.
The specific mode of determining the second context index corresponding to the ith coordinate axis based on the second weighting points corresponding to the K predicted reference frames is not limited.
In some embodiments, the decoding end determines an average value of the coordinate information of the second weighted points corresponding to the K prediction reference frames on the ith coordinate axis, and determines the second context index corresponding to the ith coordinate axis based on the average value.
In some embodiments, the above S102-A1-321-23 includes the steps of S102-A1-321-231 to S102-A1-321-233 as follows:
S102-A1-321-231, determining second weights corresponding to K prediction reference frames;
S102-B232, carrying out weighting processing on geometric information of second weighting points corresponding to the K prediction reference frames based on the second weights to obtain third weighting points;
S102-B233, determining a second context index corresponding to the ith coordinate axis based on the coordinate information of the third weighted point on the ith coordinate axis.
In this embodiment, the decoding end may determine the second weight corresponding to each of the K prediction reference frames by referring to the method of the foregoing embodiment. And then, carrying out weighting processing on the coordinate information of the second weighting points corresponding to the K prediction reference frames respectively based on the second weights to obtain third weighting points.
For example, let k=2, for example, the current frame to be decoded includes 2 prediction reference frames, where the 2 prediction reference frames include a forward frame and a backward frame of the current frame to be decoded, and let the second weight corresponding to the forward frame be W1 and the second weight corresponding to the backward frame be W2, so that based on W1 and W2, the geometric information of the second weighting point corresponding to the forward frame and the geometric information of the second weighting point corresponding to the backward frame are weighted, and the geometric information of the third weighting point is obtained.
In one example, based on the second weight, geometric information of second weighting points corresponding to the K prediction reference frames respectively is weighted and averaged to obtain coordinate information of a third weighting point.
The decoding end determines the geometrical information of the third weighting point based on the steps, and then determines the second context index corresponding to the ith coordinate axis based on the coordinate information of the third weighting point on the ith coordinate axis.
As can be seen from the above, the third weighting point is obtained by weighting the first point in the prediction node in each prediction reference frame, where the value of each bit of the first point after the prediction node is only 2 results, which is 0 or 1. Thus, in some embodiments, the value of each bit of the first weighting point obtained by weighting the first point in the N prediction nodes is also 0 or 1. In this way, when decoding the ith bit of the current point on the ith coordinate axis, the second context index corresponding to the ith bit on the ith coordinate axis is determined based on the value of the ith bit of the third weighted point on the ith coordinate axis. For example, if the value of the ith bit of the third weighting point on the ith coordinate axis is 0, the second context index corresponding to the ith bit on the ith coordinate axis is determined to be 0. For another example, if the value of the ith bit of the third weighting point on the ith coordinate axis is 1, the second context index corresponding to the ith bit on the ith coordinate axis is determined to be 1. And finally, the decoding end determines a context model corresponding to the ith bit on the ith coordinate axis based on the first context index and/or the second context index corresponding to the ith bit on the ith coordinate axis, and performs predictive decoding on the value of the ith bit on the ith coordinate axis of the current point by using the context model.
The decoding end determines a context model based on the first context index and/or the second context index after determining the first context index and/or the second context index based on the steps, and decodes the coordinate information of the current point by using the context model.
In one example, assume that DCM mode information of a prediction node is PredDCMode, and the number of points contained in the prediction node is PredNumPoints, and that geometric information of a first point in the prediction node is predPointPos. It is assumed that the decoding end performs predictive decoding on the geometric information of the current point by using the IDCM mode of the predictive node and the geometric information of the point in the predictive node, that is, the geometric decoding information of the predictive node used by the decoding end includes two kinds of following:
1) Predicting an IDCM pattern of a node;
2) The geometric information of the point (i.e. the first point) in the node is predicted, i.e. the bit information (0 or 1) in the corresponding precision of the point in the node is predicted.
For example, under the GPCC framework, the IDCM pattern of the predicted nodes includes PredDCMode (0, 1, 2). Under the AVS framework, the IDCM of a prediction node
The pattern includes PredDCMode (0, 1).
Assuming that the number of points in the current node is numPoints and the geometric information of each point is PointPos and the bit precision depth to be encoded is nodeSizeLog2, the geometric information decoding process of each point in the current node is as follows:
the geometric information of each point in the current node can be obtained through the above decoding process. Wherein ctx1 is a first context index and ctx2 is a second context index.
In the point cloud decoding method provided by the embodiment of the application, when the current node in the current decoding frame is decoded, N predicted nodes of the current node are determined in the predicted reference frame of the current frame to be decoded, and the coordinate information of the point in the current node is predicted and decoded based on the geometric decoding information of the points in the N predicted nodes. That is, the embodiment of the application optimizes the node when the DCM directly decodes, predicts and decodes the geometric information of the midpoint of the IDCM node (i.e. the current node) of the to-be-node by utilizing the geometric information of the predicted node in the predicted reference frame by considering the correlation in the time domain between the adjacent frames, and further improves the geometric information decoding efficiency of the point cloud by considering the correlation in the time domain between the adjacent frames.
The decoding end is taken as an example above to describe the point cloud decoding method provided by the embodiment of the present application in detail, and the encoding end is taken as an example below to describe the point cloud encoding method provided by the embodiment of the present application.
Fig. 17 is a schematic flow chart of a point cloud encoding method according to an embodiment of the present application. The point cloud encoding method of the embodiment of the application can be completed by the point cloud encoding device shown in the above fig. 3 or fig. 4A or fig. 8A.
As shown in fig. 17, the point cloud encoding method according to the embodiment of the present application includes:
S201, determining N prediction nodes of a current node in a prediction reference frame of a current frame to be coded.
The current node is a node to be encoded in the current frame to be encoded.
From the above, it can be seen that the point cloud includes geometric information and attribute information, and the encoding of the point cloud includes geometric encoding and attribute encoding. The embodiment of the application relates to geometric coding of point clouds.
In some embodiments, the geometric information of the point cloud is also referred to as position information of the point cloud, and thus, the geometric encoding of the point cloud is also referred to as position encoding of the point cloud.
In the octree-based encoding method, the encoding end constructs an octree structure of the point cloud based on geometric information of the point cloud, and as shown in fig. 11, uses a minimum cuboid surrounding point cloud, first performs octree division on the bounding box to obtain 8 nodes, and continues octree division on occupied nodes, i.e., nodes including points, among the 8 nodes until division to voxel level positions, for example, to cubes of 1X 1. The point cloud octree structure obtained by dividing comprises a plurality of layers of nodes, for example, N layers, and when in coding, the occupation information of each layer is coded layer by layer until the leaf nodes of the voxel level of the last layer are coded. That is, in octree coding, the point cloud is divided by octree, and finally, points in the point cloud are divided into leaf nodes of voxel levels of the octree, and the coding of the point cloud is achieved by coding the entire octree.
However, the octree-based geometric information coding mode has high compression rate on points with correlation in space, and for points in isolated positions in geometric space, the complexity can be greatly reduced and the coding efficiency can be improved by using a direct coding mode.
The direct coding mode directly codes the geometric information of the points included in the nodes, and if the points included in the nodes are more, the compression effect is poor when the direct coding mode is adopted. Therefore, for a node in the octree, it is first determined whether the node can adopt a direct coding scheme before performing direct coding. And if the node is judged to be capable of adopting the direct coding mode to code, adopting the direct coding mode to directly code the geometric information of the point included in the node. If the node is judged to be not capable of being coded by adopting the direct coding mode, the node is divided by adopting the octree mode continuously.
Specifically, the encoding end firstly judges whether the node has the qualification of direct encoding, if the node has the qualification of direct encoding, judges whether the point number of the node is smaller than or equal to a preset threshold value, and if the point number of the node is smaller than or equal to the preset threshold value, determines that the node can be encoded in a direct encoding mode. Then, the number of points included in the node and the geometric information of the points are encoded into the code stream.
At present, when the position information of the point in the current node is predictive coded, the inter-frame information is not considered, so that the coding performance of the point cloud is low.
In order to solve the above problems, in the embodiment of the present application, the encoding end performs predictive encoding on the position information of the point in the current node based on the inter-frame information corresponding to the current node, thereby improving the encoding efficiency and encoding performance of the point cloud.
Specifically, the coding end first determines N prediction nodes of the current node in the prediction reference frame of the current frame to be coded.
It should be noted that, the current frame to be encoded is a point cloud frame, and in some embodiments, the current frame to be encoded is also referred to as a current frame or a current point cloud frame to be encoded. The current node may be understood as a non-null node of any non-leaf node in the current frame to be encoded. That is, the current node is not a leaf node in the octree corresponding to the current frame to be encoded, i.e., the current node is an intermediate arbitrary node of the octree, and is not a non-null node, i.e., includes at least 1 point.
In the embodiment of the application, when the encoding end encodes a current node in a current frame to be encoded, firstly determining a prediction reference frame of the current frame to be encoded, and determining N prediction nodes of the current node in the prediction reference frame. For example, fig. 12 shows a predicted node of the current node in the predicted reference frame.
It should be noted that, in the embodiment of the present application, the number of prediction reference frames of the current frame to be encoded is not limited, for example, the current frame to be encoded has one prediction reference frame, or the current frame to be encoded has a plurality of prediction reference frames. Meanwhile, the embodiment of the application does not limit the number N of the predicted nodes of the current node, and is specifically determined according to actual needs.
The embodiment of the application also does not limit the specific mode of determining the prediction reference frame of the current frame to be coded.
In some embodiments, the previous frame or frames of the current frame to be encoded are determined as prediction reference frames for the current frame to be encoded.
For example, if the current frame to be encoded is a P frame, the P frame includes a frame previous to the P frame (i.e., a forward frame) in the inter-frame reference frame, and thus, the frame previous to the current frame to be encoded (i.e., the forward frame) may be determined as the prediction reference frame of the current frame to be encoded.
For another example, if the current frame to be encoded is a B frame, the inter-frame reference frame of the B frame includes a previous frame (i.e., a forward frame) of the P frame and a subsequent frame (i.e., a backward frame) of the P frame, and thus, the previous frame (i.e., the forward frame) of the current frame to be encoded may be determined as the prediction reference frame of the current frame to be encoded.
In some embodiments, the next or next several encoded frames of the current frame to be encoded are determined as prediction reference frames for the current frame to be encoded.
For example, if the current frame to be encoded is a B frame, a frame subsequent to the current frame to be encoded may be determined as a prediction reference frame of the current frame to be encoded.
In some embodiments, a previous frame or a plurality of previous encoded frames of the current frame to be encoded and a next frame or a plurality of next encoded frames of the current frame to be encoded are determined as prediction reference frames of the current frame to be encoded.
For example, if the current frame to be encoded is a B frame, a frame before and a frame after the current frame to be encoded may be determined as a prediction reference frame of the current frame to be encoded, where the current frame to be encoded has 2 prediction reference frames.
The specific process of determining N prediction nodes of the current node in the prediction reference frames of the current frame to be encoded in S201-a is described below by taking the example that the current frame to be encoded includes K prediction reference frames.
In some embodiments, the encoding end selects at least one prediction reference frame from the K prediction reference frames based on the occupation information of the node in the current frame to be encoded and the occupation information of the node in each of the K prediction reference frames, and searches for the prediction node of the current node in the at least one prediction reference frame. For example, at least one prediction reference frame, the occupation information of which is closest to the occupation information of the node of the current frame to be coded, is selected from the K prediction reference frames, and then the prediction node of the current node is searched in the at least one prediction reference frame.
In some embodiments, the encoding end may determine N predicted nodes of the current node by the following steps S201-A1 and S201-A2:
S201-A1, determining at least one prediction node of a current node in a kth prediction reference frame aiming at the kth prediction reference frame in K prediction reference frames, wherein K is a positive integer less than or equal to K, and K is a positive integer;
S201-A2, N prediction nodes of the current node are determined based on at least one prediction node of the current node in the K prediction reference frames.
In this embodiment, the encoding end determines at least one prediction node of the current node from each of the K prediction reference frames, and finally summarizes the at least one prediction node of each of the K prediction reference frames to obtain N prediction nodes of the current node.
The process of determining at least one predicted point of the current node in each of the K predicted reference frames by the encoding end is the same, and for convenience of description, a kth predicted reference frame of the K predicted reference frames is taken as an example.
The following describes a specific procedure for determining at least one predicted node of the current node in the kth predicted reference frame in S201-A1.
The embodiment of the application does not limit the specific mode of determining at least one prediction node of the current node in the kth prediction reference frame by the coding end.
In one mode, a predicted node of the current node is determined in the kth predicted reference frame. For example, one node of the kth prediction reference frame, which is the same as the partition depth of the current node, is determined as the prediction node of the current node.
In one example, if the number of the predicted nodes in the kth predicted reference frame is 1, one node with the smallest difference between the occupancy information and the occupancy information of the current node may be selected from the points where the kth predicted reference frame and the current node are at the same division depth, and the node 1 is identified as a predicted node of the current node in the kth predicted reference frame.
In another example, if the number of predicted nodes of the current node in the kth predicted reference frame is greater than 1, the above-determined node 1, and at least one domain node of the node 1 in the kth predicted reference frame, for example, at least one domain node coplanar, collinear, co-dotted, etc., with the node 1, are determined as the predicted nodes of the current node in the kth predicted reference frame.
In a second mode, the step of determining at least one prediction node of the current node in the kth prediction reference frame in S201-A1 includes the following steps of S201-a11 to S201-a 13:
S201-A11, determining M field nodes of a current node in a current frame to be coded, wherein the M field nodes comprise the current node, and M is a positive integer;
S201-A12, aiming at the ith field node in M field nodes, determining a corresponding node of the ith field node in the kth prediction reference frame, wherein i is a positive integer less than or equal to M;
S201-A13, determining at least one prediction node of the current node in the kth prediction reference frame based on the corresponding nodes of the M field nodes in the kth prediction reference frame.
In this implementation, the encoding end first determines M field nodes of the current node in the current frame to be encoded, before determining at least one prediction node of the current node in the kth prediction reference frame, where the M field nodes include the current node itself.
It should be noted that, in the embodiment of the present application, a specific determination manner of M domain nodes of the current node is not limited.
In one example, the M domain nodes of the current node include at least one of the domain nodes that are coplanar, collinear, and co-located with the current node in the current frame to be encoded. As shown in fig. 13, the current node includes 6 coplanar nodes, 12 collinear nodes, and 8 co-point nodes.
In another example, in addition to at least one of the field nodes that are coplanar, collinear, and co-point with the current node, the M field nodes of the current node may include other nodes within the reference neighborhood, which is not limited in this embodiment of the present application.
The coding end determines corresponding nodes of each field node in the k-th prediction reference frame after determining M field nodes of the current node in the current frame to be coded based on the steps, and further determines at least one prediction node of the current node in the k-th prediction reference frame based on the corresponding nodes of the M field nodes in the k-th prediction reference frame.
The embodiment of the application does not limit the specific implementation mode of S201-A13.
In one possible implementation manner, at least one corresponding node is screened from corresponding nodes in the kth prediction reference frame in the M field nodes and is used as at least one prediction node of the current node in the kth prediction reference frame. For example, at least one corresponding node with the smallest difference between the occupation information of the M field nodes and the occupation information of the current node is screened out from the corresponding nodes in the kth prediction reference frame and is used as at least one prediction node of the current node in the kth prediction reference frame. The method of determining the difference between the occupancy information of the corresponding node and the occupancy information of the current node may refer to the determination process of the difference between the occupancy information of the corresponding node and the occupancy information of the current node, for example, an exclusive or operation is performed on the occupancy information of the corresponding node and the occupancy information of the current node, and the exclusive or operation result is used as the difference between the occupancy information of the corresponding node and the occupancy information of the current node.
In another possible implementation manner, the encoding end determines a corresponding node of the M field nodes in the kth prediction reference frame as at least one prediction node of the current node in the kth prediction reference frame. For example, the M field nodes have one corresponding node in the kth prediction reference frame, and further have M corresponding nodes, and the M corresponding nodes are determined as prediction nodes of the current node in the kth prediction reference frame, and there are M prediction nodes in total.
The above description is directed to a process of determining at least one predicted node of a current node in a kth predicted reference frame. In this way, the encoding end can determine at least one prediction node of the current node in each of the K prediction reference frames in the same manner as described above.
After determining at least one predicted node of each of the K predicted reference frames, the encoding end executes the step S201-B, that is, determines N predicted nodes of the current node based on the at least one predicted node of the current node in the K predicted reference frames.
In one example, at least one predicted node of the current node in the K predicted reference frames is determined as N predicted nodes of the current node.
For example, k=2, i.e. K prediction reference frames comprise a first prediction reference frame and a second prediction reference frame. Assuming that the current node has 2 predicted nodes in the first predicted reference frame and the current node has 3 predicted nodes in the second predicted reference frame, it can be determined that the current node has 5 predicted nodes, where n=5.
In another example, N predicted nodes of the current node are filtered out from at least one predicted node of the K predicted reference frames.
With continued reference to the above example, assume that k=2, i.e., K predicted reference frames include a first predicted reference frame and a second predicted reference frame. Assume that the current node has 2 predicted nodes in the first predicted reference frame and that the current node has 3 predicted nodes in the second predicted reference frame. From these 5 predicted nodes, 3 predicted nodes are selected as the final predicted node of the current node. For example, from among the 5 prediction nodes, 3 prediction nodes having the smallest difference between the occupancy information and the occupancy information of the current node are selected, and the final prediction node of the current node is determined.
In the second mode, after determining M field nodes of a current node in a current frame to be encoded, the encoding end determines a corresponding node of each field node in the M field nodes in a kth prediction reference frame, and further determines at least one prediction point of the current node in the kth prediction reference frame based on the corresponding node of each field node in the M field nodes.
In a third aspect, the step of determining at least one prediction node of the current node in the kth prediction reference frame in S201-A1 includes the following steps of:
S201-B11, determining a corresponding node of the current node in a kth prediction reference frame;
S201-B12, determining at least one domain node of the corresponding node;
S201-B13, determining at least one field node as at least one prediction node of the current node in a kth prediction reference frame.
In this mode 3, for each of the K prediction reference frames, the encoding end first determines a corresponding node of the current node in each prediction reference frame. For example, a corresponding node 1 of the current node in the prediction reference frame 1 is determined, and a corresponding node 2 of the current node in the prediction reference frame 2 is determined. Then, the encoding end determines at least one domain node of each corresponding node. For example, at least one domain node of the corresponding node 1 is determined in the prediction reference frame 1, and at least one domain node of the corresponding node 2 is determined in the prediction reference frame 2. In this way, at least one domain node of the corresponding node 1 in the prediction reference frame 1 can be determined as at least one prediction node of the current node in the prediction reference frame 1, and at least one domain node of the corresponding node 2 in the prediction reference frame 2 can be determined as at least one prediction node of the current node in the prediction reference frame 2.
The process of determining the corresponding node of the ith domain node in the kth prediction reference frame in S201-a12 of the second mode is basically the same as the process of determining the corresponding node of the current node in the kth prediction reference frame in S201-B11 of the third mode. For convenience of description, the above-mentioned i-th domain node and the current node are denoted as i-th node, and a specific procedure for determining a corresponding node of the i-th node in the kth prediction reference frame is described below.
The coding end determines that the corresponding node of the ith node in the kth predicted reference frame at least comprises the following modes:
In mode 1, one node of the kth prediction reference frame, which has the same partition depth as the ith node, is determined as the corresponding node of the ith node.
Mode 2, the above-mentioned S201-a12 and S201-B11 include the following steps:
S201-A121, determining a father node of an ith node as the ith father node in a current frame to be coded;
S201-A122, determining a matching node of the ith parent node in the kth predicted reference frame as the ith matching node;
S201-A123, determining one of the sub-nodes of the i matched nodes as a corresponding node of the i node in the kth prediction reference frame.
In this mode 2, for the ith node, the encoding end determines, in the current frame to be encoded, the parent node of the ith node, and further determines, in the kth prediction reference frame, the matching node of the parent node of the ith prediction domain node. For convenience of description, the parent node of the ith node is denoted as the ith parent node, and the matching node of the parent node of the ith node in the kth prediction reference frame is determined as the ith matching node. And then, determining one child node of the child nodes of the ith matching node as a corresponding node of the ith node in the kth predicted reference frame, and accurately determining the corresponding node of the ith node in the kth predicted reference frame.
The following describes a specific procedure for determining a matching node of the ith parent node in the kth predicted reference frame in S201-a 122.
The embodiment of the application does not limit the specific mode of determining the matching node of the ith father node in the kth prediction reference frame by the coding end.
In some embodiments, the depth of the division of the i-th parent node in the current frame to be encoded is determined, e.g., the i-th parent node is at layer 2 of the octree of the current frame to be encoded. Thus, the encoding end can determine one node in the nodes with the same dividing depth as the i parent node in the k predictive reference frame as the matching node of the i parent node in the k predictive reference frame. For example, one of the nodes in the kth predicted reference frame at layer 2 is determined as a matching node of the ith parent node in the kth predicted reference frame.
In some embodiments, the encoding end determines a matching node of the ith parent node in the kth prediction reference frame based on the occupancy information of the ith parent node. Specifically, since the occupancy information of the ith parent node in the current frame to be encoded is encoded, and the occupancy information of each node in the kth prediction reference frame is also encoded. Thus, the encoding end can search the matching node of the ith father node in the kth prediction reference frame based on the occupation information of the ith father node.
For example, a node having the smallest difference between the occupancy information of the kth prediction reference frame and the occupancy information of the ith parent node is determined as a matching node of the ith parent node in the kth prediction reference frame.
For example, assuming that the occupancy information of the ith parent node is 11001101, a node having the smallest difference between the occupancy information and the occupancy information 11001101 is queried in the kth prediction reference frame. Specifically, the encoding end performs exclusive or operation on the occupation information of the ith father node and the occupation information of each node in the kth prediction reference frame, and determines the node with the smallest exclusive or operation result in the kth prediction reference frame as the matching node of the ith father node in the kth prediction reference frame.
Based on the steps, the encoding end can determine the matching node of the ith father node in the kth predicted reference frame. For ease of description, this matching node is denoted as the i-th matching node.
Then, the encoding end determines one of the sub-nodes of the ith matching node as a corresponding node of the ith field node in the kth prediction reference frame.
For example, the encoding end determines a default sub-node among the sub-nodes included in the ith matching node as a corresponding node of the ith node in the kth prediction reference frame. Assume that the 1 st child node of the i-th matching node is determined as the corresponding node of the i-th node in the k-th prediction reference frame.
For another example, the encoding end determines a first sequence number of the ith node in the child nodes included in the father node, and determines the child node with the first sequence number in the child nodes of the ith matching node as a corresponding node of the ith node in the kth predicted reference frame. Illustratively, as shown in fig. 14, the ith node is the 2 nd child of the ith parent node, where the first sequence number is 2. Thus, the 2 nd child node of the i-th matching node can be determined as the corresponding node of the i-th node.
The above description is directed to a determination process of determining an i-th domain node among M domain nodes, a corresponding node in a kth prediction reference frame, and a corresponding node of a current node in the kth prediction reference frame. In this way, the encoding end can determine N prediction nodes of the current node in the prediction reference frame in a mode two or a mode three.
Based on the above steps, the encoding end determines N prediction nodes of the current node in the prediction reference frame of the current frame to be encoded, and then performs the following step S202.
S202, based on geometric coding information of points in N prediction nodes, performing prediction coding on the position information of the points in the current node.
Because of the correlation between adjacent frames of the point cloud, the embodiment of the application refers to the correlation information between frames when the position information of the point in the current node is predictive coded based on the correlation between the adjacent frames of the point cloud. Specifically, based on geometric coding information of points in N predicted nodes of a current node, the WeChat information of the points in the current node is predicted and coded, so that coding efficiency and coding performance of point cloud are improved.
In the embodiment of the present application, based on the geometric coding information of N prediction nodes, the coordinate information of the point in the current node is predictively coded, which can be understood as using the geometric coding information of N prediction nodes as context, and the coordinate information of the point in the current node is predictively coded. For example, the encoding end determines the index of the context model based on the geometric encoding information of the N prediction nodes, further determines the target context model from a plurality of preset context models based on the index of the context model, and performs predictive encoding on the coordinate information of the point in the current node by using the context model.
In the embodiment of the present application, the process of performing predictive coding on the coordinate information of each point in the current node is basically the same based on the geometric coding information of N prediction nodes, and for convenience of description, the process of performing predictive coding on the coordinate information of the current point in the current node is described herein as an example.
In some embodiments, the step S202 includes the following steps:
S202-A, determining an index of a context model based on geometric coding information of N prediction nodes;
S202-B, determining a context model based on the index of the context model;
S202-C, performing predictive coding on the coordinate information of the current point in the current node by using a context model.
In the embodiment of the application, a plurality of context models, such as Q context models, are set for the encoding process of the coordinate information, and the embodiment of the application does not limit the specific number of the context models corresponding to the coordinate information, as long as Q is ensured to be greater than 1. That is, in the embodiment of the present application, at least one optimal context model is selected from the 2 context models, and the coordinate information of the current point in the current node is predictively encoded, so as to improve the encoding efficiency of the coordinate information of the current point.
The geometric coding information of the prediction node in the embodiment of the application can be understood as any information involved in the geometric coding process of the prediction node. For example, the number of points included in the prediction node, the occupation information of the prediction node, the coding mode of the prediction node, the geometric information of the points in the prediction node, and the like are included.
In some embodiments, the geometric coding information of the prediction node includes direct coding information of the prediction node and/or coordinate information of a point in the prediction node, where the direct coding information of the prediction node is used to indicate whether the prediction node satisfies a condition for coding in a direct coding manner.
Based on this, the above S202-A includes the steps of S202-A1 as follows:
S202-A1, determining a first context index based on direct coding information of N prediction nodes, and/or determining a second context index based on coordinate information of points of the N prediction nodes.
Correspondingly, the step S202-B includes the following steps of S202-B:
S202-B1, selecting a context model from a plurality of preset context models based on the first context index and/or the second context index.
In this embodiment, if the geometric coding information of the prediction node includes direct coding information of the prediction node and/or coordinate information of points in the prediction node, the coding end may determine the first context index based on the direct coding information of the N prediction nodes, and/or determine the second context index based on the position information of points in the N prediction nodes, and further select the final context model from the preset context models based on the first context index and/or the second context index.
It will be appreciated that in this embodiment, the manner in which the encoding end determines the context model includes, but is not limited to, the following:
In one possible implementation manner, if the geometric coding information of the prediction node includes direct coding information of the prediction node, the process of determining the context model may be to determine a first context index based on the direct coding information of N prediction nodes, and further select a final context model from a plurality of preset context models based on the first context index to code the coordinate information of the current point.
For example, the encoding end selects a final context model from the context models shown in table 4 based on the first context index.
In another possible implementation manner, if the geometric coding information of the prediction node includes coordinate information of points in the prediction node, the process of determining the context model may be to determine a second context index based on the coordinate information of points in the N prediction nodes, and further select a final context model from a plurality of preset context models based on the second context index to code the coordinate information of the current point.
For example, the encoding end selects a final context model from the context models shown in table 4 based on the second context index.
In another possible implementation manner, if the geometric coding information of the prediction node includes direct coding information of the prediction node and coordinate information of points in the prediction node, the process of determining the context model may be to determine a first context index based on the direct coding information of N prediction nodes, further determine a second context index based on the first context index and coordinate information of points in the N prediction nodes, further determine a second context index based on the second context index, further select a final context model from a plurality of preset context models based on the first context index and the second context index, and code the coordinate information of the current point.
Illustratively, the correspondence of the first context index, the second context index, and the context model is shown in table 5.
In this mode 3, the encoding end determines the first context index based on the direct encoding information of the N prediction nodes, and searches the above table 5 to obtain the final context model after determining the second context index based on the coordinate information of the points in the N prediction nodes. For example, the encoding end determines that the first context index is the first context index 2 based on the direct encoding information of the N prediction nodes, determines that the second context index is the second context index 3 based on the coordinate information of the points in the N prediction nodes, and then looks up the table 5 to obtain the final context model as the context model 23, and the encoding end encodes the coordinate information of the current point by using the context model 23.
The specific process of determining the first context index based on the direct coding information of the N prediction nodes in S202-A1 is described below.
In the embodiment of the present application, the manner of determining the first context index by the encoding end includes, but is not limited to, the following:
in one mode, the step S202-A1 includes the following steps S202-A1-11 and S202-A1-12:
S202-A1-11, determining a first numerical value corresponding to any one of N prediction nodes based on direct coding information of the prediction nodes.
In this aspect, for each of the N prediction nodes, a first value corresponding to the prediction node is determined based on the direct encoding information of the prediction node, and finally, a first context index is determined based on the first values corresponding to the N prediction nodes.
The process of determining the first value corresponding to the predicted node is described below.
As can be seen from the above, the direct coding information of the prediction node is used to indicate whether the prediction node satisfies the condition for coding by the direct coding scheme. The embodiment of the application does not limit the specific content of the direct coding information.
In some embodiments, the direct encoding information includes points included by the prediction node. In this way, the first value corresponding to the predicted node may be determined based on the number of points included in the predicted node.
In one example, under the GPCC framework, when the number of points included in the predicted node is greater than or equal to 2, the first value corresponding to the predicted node is determined to be 1, and if the number of points included in the predicted node is less than 2, the first value corresponding to the predicted node is determined to be 0. Under the AVS framework, when the number of points included in the prediction node is greater than or equal to 1, determining that the first numerical value corresponding to the prediction node is 1, and when the number of points included in the prediction node is less than 1, determining that the first numerical value corresponding to the prediction node is 0.
In another example, the number of points included in the predicted node is determined to be a first value corresponding to the predicted node, for example, when the predicted node includes 2 points, the first value corresponding to the predicted node is determined to be 2.
In some embodiments, the direct coding information of the prediction node includes a direct coding mode of the prediction node, and the step S202-A1-11 includes determining a number of the direct coding mode of the prediction node and a first value corresponding to the prediction node.
For example, under the GPCC framework, if the direct coding mode of the prediction node is mode 0, it is determined that the first value corresponding to the prediction node is 0. If the direct coding mode of the prediction node is mode 1, determining that the first value corresponding to the prediction node is 1. If the direct coding mode of the prediction node is mode 2, determining that the first value corresponding to the prediction node is 2.
For another example, in the AVS framework, if the direct decoding mode of the prediction node is mode 0, the first value corresponding to the prediction node is determined to be 0. If the direct decoding mode of the prediction node is mode 1, determining that the first value corresponding to the prediction node is 1.
Based on the above steps, the encoding end determines a first value corresponding to each of the N prediction nodes, and then executes the following steps S202-A1-12.
S202-A1-12, determining a first context index based on first numerical values corresponding to the N prediction nodes.
The encoding end determines the first context index based on the first values corresponding to the N prediction nodes after determining the first values corresponding to the N prediction nodes based on the steps.
Wherein, based on the first values corresponding to the N prediction nodes, determining the first context index at least includes the following implementation manners:
in mode 1, an average value of a sum of first values corresponding to N prediction nodes is determined as a first context index.
Mode 2, S202-A1-12 includes the steps of S202-A1-121 to S202-A1-123 as follows:
S202-A1-121, determining a first weight corresponding to a prediction node;
S202-A1-122, carrying out weighting processing on first numerical values corresponding to N prediction nodes based on first weights to obtain first weighted prediction values;
S202-A1-123, determining a first context index based on the first weighted prediction value.
In this mode 2, if the current node includes a plurality of prediction nodes, that is, N prediction nodes, when determining the first context index based on the first values corresponding to the N prediction nodes, a weight, that is, a first weight, may be determined for each of the N prediction nodes, so that the first values corresponding to the prediction nodes may be weighted based on the first weights of the prediction nodes, and then the first context index may be determined according to the final weighted result, thereby improving accuracy of determining the first context index based on the geometric coding information of the N prediction nodes.
The embodiment of the application does not limit the first weights corresponding to the N prediction nodes.
In some embodiments, the first weight corresponding to each of the N prediction nodes is a preset value. As can be seen from the above, the N predicted nodes are determined based on M domain nodes of the current node, assuming that the predicted node 1 is a predicted node corresponding to the domain node 1, if the domain node 1 is a coplanar node of the current node, the first weight of the predicted node 1 is a preset weight 1, if the domain node 1 is a collinear node of the current node, the first weight of the predicted node 1 is a preset weight 2, and if the domain node 1 is a co-node of the current node, the first weight of the predicted node 1 is a preset weight 3.
In some embodiments, for each of the N predicted nodes, a first weight corresponding to the predicted node is determined based on a distance between the domain node corresponding to the predicted node and the current node. For example, the smaller the distance between the domain node and the current node, the stronger the inter-frame correlation between the predicted node corresponding to the domain node and the current node, and further the first weight of the predicted node is larger.
In some embodiments, based on the above steps, after determining the weight corresponding to each of the N prediction nodes, normalizing the weight, and making the normalized weight into the final first weight of the prediction node.
The embodiment of the application carries out weighting processing on the first numerical values corresponding to the N prediction nodes based on the first weight, and the specific mode for obtaining the first weighted prediction value is not limited.
In one example, based on the first weight, a first numerical value corresponding to the N prediction nodes is weighted averaged to obtain a first weighted prediction value.
In another example, based on the first weights, first values corresponding to the N prediction nodes are weighted and summed to obtain a first weighted prediction value.
Based on the mode step, after determining the first weighted prediction value, determining the first context index based on the first weighted prediction value, that is, S202-A1-123 includes at least the following examples:
Example 1, a first weighted prediction value is determined as a first context index.
Example 2, a weighted prediction value range in which the first weighted prediction value is located is determined, and an index corresponding to the range is determined as the first context index.
The process of weighting N prediction nodes by the coding end to obtain the first context index is introduced.
In some embodiments, the encoding end may also determine the first context index in the following manner.
And if K is greater than 1, determining a second weighted prediction value corresponding to each of the K predicted reference frames, and further determining the first context index based on the second weighted prediction values corresponding to the K predicted reference frames respectively. At this time, the above-mentioned S202-A1 includes the steps of S202-A1-21 to S202-A1-23 as follows:
S202-A1-21, aiming at a j-th prediction reference frame in K prediction reference frames, determining a first numerical value corresponding to a prediction node in the j-th prediction reference frame based on direct coding information of the current node in the prediction node in the j-th prediction reference frame, wherein j is a positive integer less than or equal to K;
S202-A1-22, determining a first weight corresponding to a predicted node, and carrying out weighting processing on a first numerical value corresponding to the predicted node in a j-th predicted reference frame based on the first weight to obtain a second weighted predicted value corresponding to the j-th predicted reference frame;
S202-A1-23, determining a first context index based on second weighted prediction values corresponding to the K prediction reference frames.
In this second mode, each of the K prediction reference frames is considered as separate context information when determining the first context index. Specifically, direct coding information of a prediction node included in each of the K prediction reference frames is determined, a second weighted prediction value corresponding to each prediction reference frame is determined, and then a first context index is determined based on the second weighted prediction value corresponding to each prediction reference frame, so that accurate selection of the first context index is achieved, and coding efficiency of the point cloud is improved.
In the embodiment of the present application, the specific manner of determining the second weighted prediction value corresponding to each of the K prediction reference frames by the encoding end is the same, and for convenience of description, the j-th prediction reference frame of the K prediction reference frames is taken as an example for illustration.
In an embodiment of the application, the current node comprises at least one prediction node in a jth prediction reference frame, such that a first value of the at least one prediction node is determined based on direct coding information of the at least one prediction node in the jth prediction reference frame.
For example, the jth prediction reference frame includes 2 prediction nodes of the current node, which are respectively denoted as prediction node 1 and prediction node 2, and further determines the first value of the prediction node 1 based on the direct coding information of the prediction node 1, and determines the first value of the prediction node 2 based on the direct coding information of the prediction node 2. The process of determining the first value corresponding to the prediction node based on the direct coding information of the prediction node may refer to the description of the above embodiment, and the first value corresponding to the prediction node is determined based on the direct coding mode of the prediction node, for example, the number (0, 1 or 2) of the direct coding mode of the prediction node is determined as the first value corresponding to the prediction node.
After determining a first numerical value of at least one prediction node included in the jth prediction reference frame, the encoding end determines first weights respectively corresponding to the at least one prediction node, and performs weighting processing on the first numerical value corresponding to the at least one prediction node based on the first weights to obtain a second weighted prediction value corresponding to the jth prediction reference frame.
In one example, based on the first weight, a first value corresponding to a prediction node in a jth prediction reference frame is weighted averaged to obtain a second weighted prediction value corresponding to the jth prediction reference frame.
In another example, based on the first weight, a first value corresponding to a prediction node in the jth predicted reference frame is weighted and summed to obtain a second weighted prediction value corresponding to the jth predicted reference frame.
The determining process of the first weight may refer to the description of the foregoing embodiments, which is not repeated herein.
The above description is directed to a process of determining the second weighted prediction value corresponding to the jth prediction reference frame in the K prediction reference frames, where the second weighted prediction value corresponding to the other prediction reference frames in the K prediction reference frames is determined by referring to the mode corresponding to the jth prediction reference frame.
After determining the second weighted prediction value corresponding to each of the K prediction reference frames, the encoding end executes the step S202-A1-23.
The specific mode of determining the first context index based on the second weighted prediction values corresponding to the K prediction reference frames is not limited.
In some embodiments, the encoding end determines an average value of the second weighted prediction values corresponding to the K prediction reference frames as the first context index.
In some embodiments, the encoding end determines second weights corresponding to the K prediction reference frames, and performs weighting processing on second weighted prediction values corresponding to the K prediction reference frames based on the second weights, so as to obtain the first context index.
In this embodiment, the encoding end first determines a second weight corresponding to each of the K prediction reference frames. The embodiment of the application does not limit the determination of the second weight corresponding to each prediction reference frame in the K prediction reference frames.
In some embodiments, the second weight corresponding to each of the K predicted reference frames is a preset value. As can be seen from the above, the K predicted reference frames are forward frames and/or backward frames of the current frame to be encoded. Assuming that the predicted reference frame 1 is a forward frame of the current frame to be encoded, the second weight corresponding to the predicted reference frame 1 is a preset weight 1, and if the predicted reference frame 1 is a backward frame of the current frame to be encoded, the second weight corresponding to the predicted reference frame 1 is a preset weight 2.
In some embodiments, a second weight corresponding to the predicted reference frame is determined based on a temporal gap between the predicted reference frame and the current frame to be encoded. In the embodiment of the application, each point cloud includes time information, and the time information can be the time when the point cloud acquisition equipment acquires the frame point cloud. Based on this, if the time difference between the predicted reference frame and the current frame to be encoded is smaller, the inter-frame correlation between the predicted reference frame and the current frame to be encoded is stronger, and thus the second weight corresponding to the predicted reference frame is larger. For example, the inverse of the temporal difference between the predicted reference frame and the current frame to be encoded may be determined as the second weight corresponding to the predicted reference frame.
And after determining the second weight corresponding to each of the K prediction reference frames, weighting the second weighted predicted values corresponding to the K prediction reference frames respectively based on the second weights to obtain the first context index.
For example, let k=2, for example, the current frame to be encoded includes 2 prediction reference frames, where the 2 prediction reference frames include a forward frame and a backward frame of the current frame to be encoded, and let the second weight corresponding to the forward frame be W1 and the second weight corresponding to the backward frame be W2, so that the second weighted prediction value corresponding to the forward frame and the second weighted prediction value corresponding to the backward frame are weighted based on W1 and W2, to obtain the first context index.
In one example, based on the second weights, second weighted prediction values corresponding to the K prediction reference frames are weighted averaged to obtain the first context index.
In another example, based on the second weights, second weighted prediction values corresponding to the K prediction reference frames are weighted and summed to obtain the first context index.
The above describes the procedure of determining the first context index at the encoding end.
The process of determining the second context index at the encoding end is described below.
As can be seen from the above process of determining the prediction nodes, each of the N prediction nodes includes one point or a plurality of points, and if each of the N prediction nodes includes one point, the second context index is determined using the one point included in each of the N prediction nodes.
In some embodiments, if a plurality of points are included in the prediction node, a point is selected from the plurality of points to determine the second context index. At this time, the above-mentioned S202-A1 includes the steps of S202-A1-31 and S202-A1-32 as follows:
S202-A1-31, selecting a first point corresponding to a current point of a current node from points included in any one of N prediction nodes;
S202-A1-32, determining a second context index based on the coordinate information of the first point included in the N prediction nodes.
For example, assuming that N prediction nodes include a prediction node 1 and a prediction node 2, where the prediction node 1 includes a point 1 and a point 2, and the prediction node 2 includes a point 3, a point 4, and a point 5, one point is selected from the points 1 and 2 included in the prediction node 1 as a first point, and one point is selected from the points 3,4, and 5 included in the prediction node 2 as a first point. This allows determining the geometrical information of the current point based on the geometrical information of the first point in the prediction node 1 and the first point in the prediction node 2.
The embodiment of the application does not limit the specific mode of selecting the first point corresponding to the current point of the current node from the points included in the predicted node.
In one possible implementation, a point in the predicted node that coincides with the order of the current point is determined as the first point corresponding to the current point. For example, assuming that the current point is the 2 nd point in the current node, the point 2 in the prediction node 1 may be determined as the first point corresponding to the current point, and the point 4 in the prediction node 2 may be determined as the first point corresponding to the current point. For another example, if the predicted node includes only one point, the point included in the predicted node is determined as the first point corresponding to the current point.
In one possible implementation manner, if the encoding end selects a first point corresponding to the current point from the points included in the prediction node based on the rate distortion cost (or the approximate cost), the encoding end writes the identification information of the first point in the prediction node into the code stream, so that the encoding end obtains the first point in the prediction node through encoding the code stream.
The encoding end determines, for each of the N prediction nodes, a first point corresponding to a current point in each prediction node based on the above method, and further executes the step S202-A1-32 described above.
In the embodiment of the present application, the encoding end encodes coordinate information of the current point on different coordinate axes, respectively, and based on the coordinate information, the step S202-A1-32 includes the following steps of S202-A1-321:
S202-A1-321, determining a second context index corresponding to the ith coordinate axis based on the coordinate information of the first point on the ith coordinate axis, which is included in the N prediction nodes.
The ith coordinate axis may be an X axis, a Y axis, or a Z axis, which is not limited in this embodiment of the present application.
In the embodiment of the application, when the encoding end encodes the coordinate information of the current point on the ith coordinate axis, the second context index corresponding to the ith coordinate axis is determined based on the coordinate information of the first point on the ith coordinate axis, which is included in the N prediction nodes. Therefore, based on the first context index and/or the second context index corresponding to the ith coordinate axis, a context model corresponding to the ith coordinate axis is selected from a plurality of context models, and the context model corresponding to the ith coordinate axis is used for carrying out predictive coding on the coordinate information of the current point on the ith coordinate axis. For example, the encoding end determines a second context index corresponding to the X axis based on coordinate information of the first point on the X axis included in the N prediction nodes, selects a context model corresponding to the X axis from the multiple context models based on the first context index and/or the second context index corresponding to the X axis, and further performs predictive encoding on coordinate information of the current point on the X axis by using the context model corresponding to the X axis to obtain an X coordinate value of the current point. For another example, the encoding end determines a second context index corresponding to the Y axis based on the coordinate information of the first point on the Y axis included in the N prediction nodes, and selects a context model corresponding to the Y axis from the multiple context models based on the first context index and/or the second context index corresponding to the Y axis, and further performs predictive encoding on the coordinate information of the current point on the Y axis by using the context model corresponding to the Y axis, so as to obtain the Y coordinate value of the current point.
The following describes a process that the encoding end determines the second context index corresponding to the ith coordinate axis based on the coordinate information of the first point on the ith coordinate axis included in the N prediction nodes.
In the embodiment of the present application, the implementation manners of the above S202-A1-321 include, but are not limited to, the following:
In one aspect, first points included in the N prediction nodes are weighted, and a second context index corresponding to the i-th coordinate axis is determined based on the weighted coordinate information. At this time, the above-mentioned S202-A1-321 includes the steps of S202-A1-321-11 to S202-A1-321-13 as follows:
S202-A1-321-11, determining a first weight corresponding to a prediction node;
S202-A1-321-12, carrying out weighting processing on coordinate information of first points included in N prediction nodes based on first weights to obtain first weighted points;
S202-A1-321-13, determining a second context index corresponding to the ith coordinate axis based on the coordinate information of the first weighting point on the ith coordinate axis.
In this aspect, if the current node includes a plurality of prediction nodes, that is, N prediction nodes, when determining the second context index corresponding to the i-th coordinate axis based on the coordinate information of the first point included in the N prediction nodes, a weight, that is, the first weight, may be determined for each of the N prediction nodes. In this way, the coordinate information of the first point included in each prediction node can be weighted based on the first weight of each prediction node to obtain the first weighted point, and then the second context index corresponding to the ith coordinate axis is determined according to the coordinate information of the first weighted point on the ith coordinate axis, so that the accuracy of encoding the current point based on the geometric encoding information of N prediction nodes is improved.
The process of determining the first weights corresponding to the N prediction nodes in the embodiment of the present application may refer to the description of the foregoing embodiment, which is not repeated herein.
After determining a first weight corresponding to each of the N prediction nodes, the encoding end performs weighting processing on coordinate information of a first point included in the N prediction nodes based on the first weight to obtain a first weighting point.
The embodiment of the application carries out weighting processing on the coordinate information of the first points included in the N prediction nodes based on the first weight, and the specific mode for obtaining the first weighted points is not limited.
In one example, based on the first weight, the coordinate information of the first points included in the N prediction nodes is weighted and averaged to obtain a first weighted point.
And determining a second context index corresponding to the ith coordinate axis based on the coordinate information of the first weighting point on the ith coordinate axis after determining the first weighting point based on the mode step.
As can be seen from the above, the first weighting point is obtained by weighting a first point of the N prediction nodes, where the value of each bit of the first point after the prediction node is only 2 results, which is 0 or 1. Thus, in some embodiments, the value of each bit of the first weighting point obtained by weighting the first point in the N prediction nodes is also 0 or 1. In this way, when the ith bit of the current point on the ith coordinate axis is coded, the second context index corresponding to the ith bit on the ith coordinate axis is determined based on the value of the ith bit of the first weighting point on the ith coordinate axis. For example, if the value of the ith bit of the first weighting point on the ith coordinate axis is 0, the second context index corresponding to the ith bit on the ith coordinate axis is determined to be 0. For another example, if the value of the ith bit of the first weighting point on the ith coordinate axis is 1, the second context index corresponding to the ith bit on the ith coordinate axis is determined to be 1. And finally, the coding end determines a context model corresponding to the ith bit on the ith coordinate axis based on the first context index and/or the second context index corresponding to the ith bit on the ith coordinate axis, and performs predictive coding on the value of the ith bit on the ith coordinate axis of the current point by using the context model.
The encoding end may determine the second context index by the following manner, in addition to determining the second context index based on the first manner.
In the second mode, if K is greater than 1, the first point included in the prediction node in each of the K prediction reference frames is weighted, and the second context index corresponding to the i-th coordinate axis is determined based on the weighted coordinate information. At this time, the step S202-B includes the following steps S202-B21 to S202-B23:
S202-A1-321-21, determining a first weight corresponding to a prediction node in a j-th prediction reference frame aiming at the j-th prediction reference frame in K prediction reference frames;
S202-A1-321-22, carrying out weighting processing on coordinate information of a first point included in a prediction node in a jth prediction reference frame based on the first weight to obtain a second weighting point corresponding to the jth prediction reference frame, wherein j is a positive integer less than or equal to K;
S202-A1-321-23, determining a second context index corresponding to the ith coordinate axis based on second weighting points corresponding to the K prediction reference frames.
In the second mode, each of the K prediction reference frames is considered separately when determining the geometric information of the current point. Specifically, the coordinate information of a first point in a prediction node of each prediction reference frame in the K prediction reference frames is determined, a second weighting point corresponding to each prediction reference frame is determined, and then a second context index corresponding to an ith coordinate axis is determined based on the coordinate information of the second weighting point corresponding to each prediction reference frame, so that accurate prediction of the second context index is realized, and further the coding efficiency of point cloud is improved.
In the embodiment of the present application, the specific manner of determining the second weighting point corresponding to each of the K prediction reference frames by the encoding end is the same, and for convenience of description, the j-th prediction reference frame of the K prediction reference frames is taken as an example for illustration.
In the embodiment of the present application, the current node includes at least one prediction node in the jth prediction reference frame, so that a second weighting point corresponding to the jth prediction reference frame is determined based on the coordinate information of a first point included in the at least one prediction node in the jth prediction reference frame.
For example, the jth prediction reference frame includes 2 prediction nodes of the current node, which are respectively denoted as a prediction node 1 and a prediction node 2, and further performs weighting processing on the geometric information of the first point included in the prediction node 1 and the coordinate information of the first point included in the prediction node 2 to obtain a second weighting point corresponding to the jth prediction reference frame.
Before the coding end performs weighting processing on the geometric information of the first point included in the prediction node in the jth prediction reference frame, first determining a first weight corresponding to each prediction node in the jth prediction reference frame. The determining process of the first weight may refer to the description of the foregoing embodiments, which is not repeated herein.
And then, the coding end carries out weighting processing on the coordinate information of the first point included in the prediction node in the j-th prediction reference frame based on the first weight to obtain a second weighting point corresponding to the j-th prediction reference frame.
In one example, based on the first weight, the coordinate information of the first point included in the prediction node in the jth prediction reference frame is weighted and averaged to obtain a second weighted point corresponding to the jth prediction reference frame.
The above description of the process of determining the second weighting point corresponding to the jth predicted reference frame in the K predicted reference frames may be performed by referring to the manner corresponding to the jth predicted reference frame.
After determining the second weighting point corresponding to each of the K prediction reference frames, the encoding end executes the step S202-A1-321-23.
The specific mode of determining the second context index corresponding to the ith coordinate axis based on the second weighting points corresponding to the K predicted reference frames is not limited.
In some embodiments, the encoding end determines an average value of the coordinate information of the second weighted points corresponding to the K prediction reference frames on the ith coordinate axis, and determines the second context index corresponding to the ith coordinate axis based on the average value.
In some embodiments, the above S202-A1-321-23 includes the following steps S202-A1-321-231 to S202-A1-321-233:
S202-A1-321-231, determining second weights corresponding to K prediction reference frames;
S202-B232, carrying out weighting processing on geometric information of second weighting points corresponding to K prediction reference frames based on second weights to obtain third weighting points;
S202-B233, determining a second context index corresponding to the ith coordinate axis based on the coordinate information of the third weighted point on the ith coordinate axis.
In this embodiment, the encoding end may determine the second weight corresponding to each of the K prediction reference frames by referring to the method of the foregoing embodiment. And then, carrying out weighting processing on the coordinate information of the second weighting points corresponding to the K prediction reference frames respectively based on the second weights to obtain third weighting points.
For example, let k=2, for example, the current frame to be encoded includes 2 prediction reference frames, where the 2 prediction reference frames include a forward frame and a backward frame of the current frame to be encoded, and let the second weight corresponding to the forward frame be W1 and the second weight corresponding to the backward frame be W2, so that based on W1 and W2, the geometric information of the second weighting point corresponding to the forward frame and the geometric information of the second weighting point corresponding to the backward frame are weighted, and the geometric information of the third weighting point is obtained.
In one example, based on the second weight, geometric information of second weighting points corresponding to the K prediction reference frames respectively is weighted and averaged to obtain coordinate information of a third weighting point.
The encoding end determines the geometrical information of the third weighting point based on the steps, and then determines the second context index corresponding to the ith coordinate axis based on the coordinate information of the third weighting point on the ith coordinate axis.
As can be seen from the above, the third weighting point is obtained by weighting the first point in the prediction node in each prediction reference frame, where the value of each bit of the first point after the prediction node is only 2 results, which is 0 or 1. Thus, in some embodiments, the value of each bit of the first weighting point obtained by weighting the first point in the N prediction nodes is also 0 or 1. In this way, when the ith bit of the current point on the ith coordinate axis is coded, the second context index corresponding to the ith bit on the ith coordinate axis is determined based on the value of the ith bit on the ith coordinate axis of the third weighting point. For example, if the value of the ith bit of the third weighting point on the ith coordinate axis is 0, the second context index corresponding to the ith bit on the ith coordinate axis is determined to be 0. For another example, if the value of the ith bit of the third weighting point on the ith coordinate axis is 1, the second context index corresponding to the ith bit on the ith coordinate axis is determined to be 1. And finally, the coding end determines a context model corresponding to the ith bit on the ith coordinate axis based on the first context index and/or the second context index corresponding to the ith bit on the ith coordinate axis, and performs predictive coding on the value of the ith bit on the ith coordinate axis of the current point by using the context model.
The encoding end determines a context model based on the first context index and/or the second context index after determining the first context index and/or the second context index based on the steps, and encodes the coordinate information of the current point by using the context model.
In one example, assume that DCM mode information of a prediction node is PredDCMode, and the number of points contained in the prediction node is PredNumPoints, and that geometric information of a first point in the prediction node is predPointPos. It is assumed that the encoding end performs predictive encoding on the geometric information of the current point by using the IDCM mode of the prediction node and the geometric information of the point in the prediction node, that is, the geometric encoding information of the prediction node used by the encoding end includes two kinds of information as follows:
1) Predicting an IDCM pattern of a node;
2) The geometric information of the point (i.e. the first point) in the node is predicted, i.e. the bit information (0 or 1) in the corresponding precision of the point in the node is predicted.
For example, under the GPCC framework, the IDCM pattern of the predicted nodes includes PredDCMode (0, 1, 2). Under the AVS framework, the IDCM mode of the prediction node includes PredDCMode (0, 1).
Assuming that the number of points in the current node is numPoints, the geometric information of each point is PointPos, and the bit precision depth to be encoded is nodeSizeLog2, the geometric information encoding process of each point in the current node is as follows:
The geometric information of each point in the current node can be obtained through the above-mentioned encoding process. Wherein ctx1 is a first context index and ctx2 is a second context index.
In the point cloud coding method provided by the embodiment of the application, when the current node in the current coding frame is coded, N prediction nodes of the current node are determined in the prediction reference frame of the current frame to be coded, and the coordinate information of the point in the current node is predicted and coded based on the geometric coding information of the points in the N prediction nodes. That is, the embodiment of the application optimizes the node when DCM direct coding is performed, performs predictive coding on the geometric information of the midpoint of the IDCM node (i.e. the current node) of the to-be-node by utilizing the geometric information of the predicted node in the predicted reference frame by considering the correlation in the time domain between adjacent frames, and further improves the geometric information coding efficiency of the point cloud by considering the correlation in the time domain between adjacent frames.
It should be understood that fig. 10 to 17 are only examples of the present application and should not be construed as limiting the present application.
The preferred embodiments of the present application have been described in detail above with reference to the accompanying drawings, but the present application is not limited to the specific details of the above embodiments, and various simple modifications can be made to the technical solution of the present application within the scope of the technical concept of the present application, and all the simple modifications belong to the protection scope of the present application. For example, the specific features described in the above embodiments may be combined in any suitable manner, and in order to avoid unnecessary repetition, various possible combinations are not described further. As another example, any combination of the various embodiments of the present application may be made without departing from the spirit of the present application, which should also be regarded as the disclosure of the present application.
It should be further understood that, in the various method embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic of the processes, and should not constitute any limitation on the implementation process of the embodiments of the present application. In addition, in the embodiment of the present application, the term "and/or" is merely an association relationship describing the association object, which means that three relationships may exist. Specifically, A and/or B may represent three cases where A alone exists, while A and B exist, and B alone exists. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.
The method embodiment of the present application is described in detail above with reference to fig. 10 to 17, and the apparatus embodiment of the present application is described in detail below with reference to fig. 18 to 19.
Fig. 18 is a schematic block diagram of a point cloud decoding apparatus provided by an embodiment of the present application.
As shown in fig. 18, the point cloud decoding apparatus 10 may include:
A determining unit 11, configured to determine N prediction nodes of a current node in a prediction reference frame of a current frame to be decoded, where the current node is a node to be decoded in the current frame to be decoded, and N is a positive integer;
and a decoding unit 12, configured to perform predictive decoding on coordinate information of points in the current node based on geometric decoding information of the N prediction nodes.
In some embodiments, the current frame to be decoded includes K prediction reference frames, a determining unit 11, configured to determine, for a kth prediction reference frame of the K prediction reference frames, at least one prediction node of the current node in the kth prediction reference frame, where K is a positive integer less than or equal to K, where K is a positive integer, and determine N prediction nodes of the current node based on the at least one prediction node of the current node in the K prediction reference frames.
In some embodiments, the determining unit 11 is specifically configured to determine M field nodes of the current node in the current frame to be decoded, where the M field nodes include the current node, and the M is a positive integer, determine, for an ith field node of the M field nodes, a corresponding node of the ith field node in the kth prediction reference frame, where the i is a positive integer less than or equal to M, and determine, based on a corresponding node of the M field nodes in the kth prediction reference frame, at least one prediction node of the current node in the kth prediction reference frame.
In some embodiments, the determining unit 11 is specifically configured to determine a corresponding node of the current node in the kth prediction reference frame, determine at least one domain node of the corresponding node, and determine the at least one domain node as at least one prediction node of the current node in the kth prediction reference frame.
In some embodiments, the determining unit 11 is specifically configured to determine, in the current frame to be decoded, a parent node of an ith node as the ith parent node, where the ith node is the ith domain node or is the current node, determine a matching node of the ith parent node in the kth prediction reference frame as the ith matching node, and determine one of child nodes of the ith matching node as a corresponding node of the ith node in the kth prediction reference frame.
In some embodiments, the determining unit 11 is specifically configured to determine, based on the occupancy information of the ith parent node, a matching node of the ith parent node in the kth prediction reference frame.
In some embodiments, the determining unit 11 is specifically configured to determine, as a matching node of the ith parent node in the kth prediction reference frame, a node with a smallest difference between the occupancy information of the kth prediction reference frame and the occupancy information of the ith parent node.
In some embodiments, the determining unit 11 is specifically configured to determine a first sequence number of the ith node in the child nodes included in the parent node, and determine a child node with the first sequence number in the child nodes of the ith matching node as a corresponding node of the ith node in the kth prediction reference frame.
In some embodiments, the determining unit 11 is specifically configured to determine, as at least one prediction node of the current node in the kth prediction reference frame, a corresponding node of the M domain nodes in the kth prediction reference frame.
In some embodiments, the determining unit 11 is specifically configured to determine at least one prediction node of the current node in the K prediction reference frames as N prediction nodes of the current node.
In some embodiments, if the current frame to be decoded is a P frame, the K prediction reference frames include a forward frame of the current frame to be decoded.
In some embodiments, if the current frame to be decoded is a B frame, the K prediction reference frames include a forward frame and a backward frame of the current frame to be decoded.
In some embodiments, the prediction unit 12 is specifically configured to determine an index of a context model based on geometric decoding information of the N prediction nodes, determine the context model based on the index of the context model, and perform predictive decoding on coordinate information of a current point in the current node using the context model.
In some embodiments, the geometric decoding information of the prediction node includes direct decoding information of the prediction node and/or location information of points in the prediction node, where the direct decoding information is used to indicate whether the prediction node meets a condition of decoding in a direct decoding manner, and the prediction unit 12 is specifically configured to determine a first context index based on the direct decoding information of the N prediction nodes and/or determine a second context index based on coordinate information of points in the N prediction nodes, and select the context model from a preset plurality of context models based on the first context index and/or the second context index.
In some embodiments, the prediction unit 12 is specifically configured to determine, for any one of the N prediction nodes, a first value corresponding to the prediction node based on the direct decoding information of the prediction node, and determine the first context index based on the first value corresponding to the N prediction nodes.
In some embodiments, the prediction unit 12 is specifically configured to determine a first value corresponding to the prediction node by using the number of the direct decoding mode of the prediction node.
In some embodiments, the prediction unit 12 is specifically configured to determine a first weight corresponding to the prediction node, perform a weighting process on first values corresponding to the N prediction nodes based on the first weight to obtain a first weighted prediction value, and determine the first context index based on the first weighted prediction value.
In some embodiments, the prediction unit 12 is specifically configured to determine, for a jth prediction reference frame of the K prediction reference frames, a first value corresponding to a prediction node in the jth prediction reference frame based on direct decoding information of the current node in the prediction node in the jth prediction reference frame, where j is a positive integer less than or equal to K, determine a first weight corresponding to the prediction node, and perform a weighting process on the first value corresponding to the prediction node in the jth prediction reference frame based on the first weight to obtain a second weighted prediction value corresponding to the jth prediction reference frame, and determine the first context index based on the second weighted prediction value corresponding to the K prediction reference frames.
In some embodiments, the prediction unit 12 is specifically configured to determine second weights corresponding to the K prediction reference frames, and perform weighting processing on second weighted prediction values corresponding to the K prediction reference frames respectively based on the second weights, so as to obtain the first context index.
In some embodiments, the prediction unit 12 is specifically configured to, for any one of the N prediction nodes, select a first point corresponding to a current point in the current node from points included in the prediction nodes, and determine the second context index based on coordinate information of the first point included in the N prediction nodes.
In some embodiments, the prediction unit 12 is specifically configured to determine, based on coordinate information of a first point included in the N prediction nodes on an ith coordinate axis, a second context index corresponding to the ith coordinate axis, where the ith coordinate axis is an X coordinate axis, a Y coordinate axis or a Z coordinate axis, select, based on the first context index and/or the second context index corresponding to the ith coordinate axis, a context model corresponding to the ith coordinate axis from the multiple context models, and perform predictive decoding on the coordinate information of the current point on the ith coordinate axis using the context model corresponding to the ith coordinate axis.
In some embodiments, the prediction unit 12 is specifically configured to determine a first weight corresponding to the prediction node, perform weighting processing on coordinate information of a first point included in the N prediction nodes based on the first weight to obtain a first weighted point, and determine a second context index corresponding to the i coordinate axis based on coordinate information of the first weighted point on the i coordinate axis.
In some embodiments, if K is greater than 1, the prediction unit 12 is specifically configured to determine, based on a j-th prediction reference frame of the K-th prediction reference frames, a first weight corresponding to a prediction node in the j-th prediction reference frame, perform, based on the first weight, weighting processing on coordinate information of a first point included in the prediction node in the j-th prediction reference frame to obtain a second weighted point corresponding to the j-th prediction reference frame, where j is a positive integer less than or equal to K, and determine, based on the second weighted point corresponding to the K-th prediction reference frame, a second context index corresponding to the i-th coordinate axis.
In some embodiments, the prediction unit 12 is specifically configured to determine a second weight corresponding to the K prediction reference frames, perform weighting processing on coordinate information of a second weighting point corresponding to the K prediction reference frames based on the second weight to obtain a third weighting point, and determine a second context index corresponding to the i coordinate axis based on coordinate information of the third weighting point on the i coordinate axis.
In some embodiments, the prediction unit 12 is specifically configured to determine the first weight corresponding to the prediction node based on a distance between the domain node corresponding to the prediction node and the current node.
In some embodiments, the prediction unit 12 is specifically configured to determine the second weight corresponding to the predicted reference frame based on a time gap between the predicted reference frame and the current frame to be decoded.
It should be understood that apparatus embodiments and method embodiments may correspond with each other and that similar descriptions may refer to the method embodiments. To avoid repetition, no further description is provided here. Specifically, the point cloud decoding apparatus 10 shown in fig. 18 may correspond to a corresponding main body in performing the point cloud decoding method according to the embodiment of the present application, and the foregoing and other operations and/or functions of each unit in the point cloud decoding apparatus 10 are respectively for implementing a corresponding flow in the point cloud decoding method, which are not described herein for brevity.
Fig. 19 is a schematic block diagram of a point cloud encoding apparatus provided by an embodiment of the present application.
As shown in fig. 19, the point cloud encoding apparatus 20 includes:
The determining unit 21 is specifically configured to determine N prediction nodes of a current node in a prediction reference frame of a current frame to be encoded, where the current node is a node to be encoded in the current frame to be encoded, and N is a positive integer;
And the encoding unit 22 is used for performing predictive encoding on the coordinate information of the points in the current node based on the geometric encoding information of the N prediction nodes.
In some embodiments, the current frame to be encoded includes K prediction reference frames, a determining unit 21, configured to determine, for a kth prediction reference frame of the K prediction reference frames, at least one prediction node of the current node in the kth prediction reference frame, where K is a positive integer less than or equal to K, where K is a positive integer, and determine N prediction nodes of the current node based on the at least one prediction node of the current node in the K prediction reference frames.
In some embodiments, the determining unit 21 is specifically configured to determine M field nodes of the current node in the current frame to be encoded, where the M field nodes include the current node, and the M is a positive integer, determine, for an i-th field node of the M field nodes, a corresponding node of the i-th field node in the k-th prediction reference frame, where the i is a positive integer less than or equal to M, and determine, based on a corresponding node of the M field nodes in the k-th prediction reference frame, at least one prediction node of the current node in the k-th prediction reference frame.
In some embodiments, the determining unit 21 is specifically configured to determine a corresponding node of the current node in the kth prediction reference frame, determine at least one domain node of the corresponding node, and determine the at least one domain node as at least one prediction node of the current node in the kth prediction reference frame.
In some embodiments, the determining unit 21 is specifically configured to determine, in the current frame to be encoded, a parent node of an ith node as the ith parent node, where the ith node is the ith domain node or is the current node, determine a matching node of the ith parent node in the kth prediction reference frame as the ith matching node, and determine one of child nodes of the ith matching node as a corresponding node of the ith node in the kth prediction reference frame.
In some embodiments, the determining unit 21 is specifically configured to determine, based on the occupancy information of the ith parent node, a matching node of the ith parent node in the kth prediction reference frame.
In some embodiments, the determining unit 21 is specifically configured to determine, as a matching node of the ith parent node in the kth prediction reference frame, a node with a smallest difference between the occupancy information of the kth prediction reference frame and the occupancy information of the ith parent node.
In some embodiments, the determining unit 21 is specifically configured to determine a first sequence number of the ith node in the child nodes included in the parent node, and determine a child node with the first sequence number in the child nodes of the ith matching node as a corresponding node of the ith node in the kth prediction reference frame.
In some embodiments, the determining unit 21 is specifically configured to determine, as at least one prediction node of the current node in the kth prediction reference frame, a corresponding node of the M domain nodes in the kth prediction reference frame.
In some embodiments, the determining unit 21 is specifically configured to determine at least one prediction node of the current node in the K prediction reference frames as N prediction nodes of the current node.
In some embodiments, if the current frame to be encoded is a P frame, the K prediction reference frames include a forward frame of the current frame to be encoded.
In some embodiments, if the current frame to be encoded is a B frame, the K prediction reference frames include a forward frame and a backward frame of the current frame to be encoded.
In some embodiments, the encoding unit 22 is specifically configured to determine an index of a context model based on geometric encoding information of the N prediction nodes, determine the context model based on the index of the context model, and perform predictive encoding on coordinate information of a current point in the current node using the context model.
In some embodiments, the geometric coding information of the prediction node includes direct coding information of the prediction node and/or position information of points in the prediction node, where the direct coding information is used to indicate whether the prediction node meets a condition of coding by a direct coding mode, and the coding unit 22 is specifically configured to determine a first context index based on the direct coding information of the N prediction nodes and/or determine a second context index based on coordinate information of points in the N prediction nodes, and select the context model from a preset plurality of context models based on the first context index and/or the second context index.
In some embodiments, the encoding unit 22 is specifically configured to determine, for any one of the N prediction nodes, a first value corresponding to the prediction node based on direct encoding information of the prediction node, and determine the first context index based on the first value corresponding to the N prediction nodes.
In some embodiments, the direct coding information includes a direct coding mode of the prediction node, and the coding unit 22 is specifically configured to determine a number of the direct coding mode of the prediction node, and determine a first value corresponding to the prediction node.
In some embodiments, the encoding unit 22 is specifically configured to determine a first weight corresponding to the prediction node, perform a weighting process on first values corresponding to the N prediction nodes based on the first weight to obtain a first weighted prediction value, and determine the first context index based on the first weighted prediction value.
In some embodiments, the encoding unit 22 is specifically configured to determine, for a jth prediction reference frame of the K prediction reference frames, a first value corresponding to a prediction node in the jth prediction reference frame based on direct encoding information of the current node in the prediction node in the jth prediction reference frame, where j is a positive integer less than or equal to K, determine a first weight corresponding to the prediction node, and perform a weighting process on the first value corresponding to the prediction node in the jth prediction reference frame based on the first weight to obtain a second weighted prediction value corresponding to the jth prediction reference frame, and determine the first context index based on the second weighted prediction value corresponding to the K prediction reference frames.
In some embodiments, the encoding unit 22 is specifically configured to determine second weights corresponding to the K prediction reference frames, and perform weighting processing on second weighted prediction values corresponding to the K prediction reference frames respectively based on the second weights, so as to obtain the first context index.
In some embodiments, the encoding unit 22 is specifically configured to, for any one of the N prediction nodes, select a first point corresponding to a current point in the current node from points included in the prediction nodes, and determine the second context index based on coordinate information of the first point included in the N prediction nodes.
In some embodiments, the encoding unit 22 is specifically configured to determine, based on coordinate information of a first point included in the N prediction nodes on an ith coordinate axis, a second context index corresponding to the ith coordinate axis, where the ith coordinate axis is an X coordinate axis, a Y coordinate axis or a Z coordinate axis, select, based on the first context index and/or the second context index corresponding to the ith coordinate axis, a context model corresponding to the ith coordinate axis from the multiple context models, and perform predictive encoding on the coordinate information of the current point on the ith coordinate axis using the context model corresponding to the ith coordinate axis.
In some embodiments, the encoding unit 22 is specifically configured to determine a first weight corresponding to the prediction node, perform weighting processing on coordinate information of a first point included in the N prediction nodes based on the first weight to obtain a first weighted point, and determine a second context index corresponding to the i coordinate axis based on coordinate information of the first weighted point on the i coordinate axis.
In some embodiments, the encoding unit 22 is specifically configured to determine, for a jth prediction reference frame of the K prediction reference frames, a first weight corresponding to a prediction node in the jth prediction reference frame, perform weighting processing on coordinate information of a first point included in the prediction node in the jth prediction reference frame based on the first weight to obtain a second weighted point corresponding to the jth prediction reference frame, where j is a positive integer less than or equal to K, and determine, based on the second weighted point corresponding to the K prediction reference frames, a second context index corresponding to the ith coordinate axis.
In some embodiments, the encoding unit 22 is specifically configured to determine a second weight corresponding to the K prediction reference frames, perform weighting processing on coordinate information of a second weighting point corresponding to the K prediction reference frames based on the second weight to obtain a third weighting point, and determine a second context index corresponding to the i-th coordinate axis based on coordinate information of the third weighting point on the i-th coordinate axis.
In some embodiments, the encoding unit 22 is specifically configured to determine the first weight corresponding to the prediction node based on a distance between the domain node corresponding to the prediction node and the current node.
In some embodiments, the encoding unit 22 is specifically configured to determine the second weight corresponding to the prediction reference frame based on a temporal difference between the prediction reference frame and the current frame to be encoded.
It should be understood that apparatus embodiments and method embodiments may correspond with each other and that similar descriptions may refer to the method embodiments. To avoid repetition, no further description is provided here. Specifically, the point cloud encoding apparatus 20 shown in fig. 19 may correspond to a corresponding main body in performing the point cloud encoding method according to the embodiment of the present application, and the foregoing and other operations and/or functions of each unit in the point cloud encoding apparatus 20 are respectively for implementing a corresponding flow in the point cloud encoding method, which are not described herein for brevity.
The apparatus and system of embodiments of the present application are described above in terms of functional units in conjunction with the accompanying drawings. It should be understood that the functional units may be implemented in hardware, or in instructions in software, or in a combination of hardware and software units. Specifically, each step of the method embodiment in the embodiment of the present application may be implemented by an integrated logic circuit of hardware in a processor and/or an instruction in a software form, and the steps of the method disclosed in connection with the embodiment of the present application may be directly implemented as a hardware decoding processor or implemented by a combination of hardware and software units in the decoding processor. Alternatively, the software elements may reside in a well-established storage medium in the art such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, registers, and the like. The storage medium is located in a memory, and the processor reads information in the memory, and in combination with hardware, performs the steps in the above method embodiments.
Fig. 20 is a schematic block diagram of an electronic device provided by an embodiment of the present application.
As shown in fig. 20, the electronic device 30 may be a point cloud decoding device or a point cloud encoding device according to an embodiment of the present application, and the electronic device 30 may include:
A memory 33 and a processor 32, the memory 33 being adapted to store a computer program 34 and to transmit the program code 34 to the processor 32. In other words, the processor 32 may call and run the computer program 34 from the memory 33 to implement the method of an embodiment of the present application.
For example, the processor 32 may be configured to perform the steps of the method 200 described above in accordance with instructions in the computer program 34.
In some embodiments of the present application, the processor 32 may include, but is not limited to:
A general purpose Processor, digital signal Processor (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), field programmable gate array (Field Programmable GATE ARRAY, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like.
In some embodiments of the present application, the memory 33 includes, but is not limited to:
Volatile memory and/or nonvolatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable EPROM (EEPROM), or a flash Memory. The volatile memory may be random access memory (Random Access Memory, RAM) which acts as external cache memory. By way of example, and not limitation, many forms of RAM are available, such as static random access memory (STATIC RAM, SRAM), dynamic random access memory (DYNAMIC RAM, DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate Synchronous dynamic random access memory (Double DATA RATE SDRAM, DDR SDRAM), enhanced Synchronous dynamic random access memory (ENHANCED SDRAM, ESDRAM), synchronous link dynamic random access memory (SYNCH LINK DRAM, SLDRAM), and Direct memory bus RAM (DR RAM).
In some embodiments of the application, the computer program 34 may be partitioned into one or more units that are stored in the memory 33 and executed by the processor 32 to perform the methods provided by the application. The one or more elements may be a series of computer program instruction segments capable of performing the specified functions, which instruction segments describe the execution of the computer program 34 in the electronic device 30.
As shown in fig. 20, the electronic device 30 may further include:
A transceiver 33, the transceiver 33 being connectable to the processor 32 or the memory 33.
The processor 32 may control the transceiver 33 to communicate with other devices, and in particular, may send information or data to other devices or receive information or data sent by other devices. The transceiver 33 may include a transmitter and a receiver. The transceiver 33 may further include antennas, the number of which may be one or more.
It will be appreciated that the various components in the electronic device 30 are connected by a bus system that includes, in addition to a data bus, a power bus, a control bus, and a status signal bus.
Fig. 21 is a schematic block diagram of a point cloud codec system provided by an embodiment of the present application.
As shown in fig. 21, the point cloud encoding and decoding system 40 may include a point cloud encoder 41 and a point cloud decoder 42, wherein the point cloud encoder 41 is configured to perform the point cloud encoding method according to the embodiment of the present application, and the point cloud decoder 42 is configured to perform the point cloud decoding method according to the embodiment of the present application.
The application also provides a code stream which is generated according to the coding method.
The present application also provides a computer storage medium having stored thereon a computer program which, when executed by a computer, enables the computer to perform the method of the above-described method embodiments. Alternatively, embodiments of the present application also provide a computer program product comprising instructions which, when executed by a computer, cause the computer to perform the method of the method embodiments described above.
When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line (digital subscriber line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., digital point cloud optical disk (digital video disc, DVD)), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment. For example, functional units in various embodiments of the application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (56)
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2023/071072 WO2024145935A1 (en) | 2023-01-06 | 2023-01-06 | Point cloud encoding method and apparatus, point cloud decoding method and apparatus, device, and storage medium |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN120303940A true CN120303940A (en) | 2025-07-11 |
Family
ID=91803377
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202380078227.5A Pending CN120303940A (en) | 2023-01-06 | 2023-01-06 | Point cloud encoding and decoding method, device, equipment and storage medium |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN120303940A (en) |
| WO (1) | WO2024145935A1 (en) |
Family Cites Families (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114095735B (en) * | 2020-08-24 | 2025-09-12 | 北京大学深圳研究生院 | A point cloud geometry inter-frame prediction method based on block motion estimation and motion compensation |
| CN112565764B (en) * | 2020-12-03 | 2022-10-04 | 西安电子科技大学 | Point cloud geometric information interframe coding and decoding method |
| CN117321991A (en) * | 2021-06-11 | 2023-12-29 | Oppo广东移动通信有限公司 | Prediction method, device and codec for point cloud attributes |
| CN115471627B (en) * | 2021-06-11 | 2026-01-30 | 维沃移动通信有限公司 | Geometric information encoding and decoding methods and related equipment for point clouds |
| CN114143556B (en) * | 2021-12-28 | 2024-12-03 | 苏州联视泰电子信息技术有限公司 | An inter-frame encoding and decoding method for compressing 3D sonar point cloud data |
-
2023
- 2023-01-06 CN CN202380078227.5A patent/CN120303940A/en active Pending
- 2023-01-06 WO PCT/CN2023/071072 patent/WO2024145935A1/en not_active Ceased
Also Published As
| Publication number | Publication date |
|---|---|
| WO2024145935A1 (en) | 2024-07-11 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN117321991A (en) | Prediction method, device and codec for point cloud attributes | |
| US20260019635A1 (en) | Point cloud encoding method and apparatus, point cloud decoding method and apparatus, device, and storage medium | |
| CN115086716A (en) | Method and device for selecting neighbor points in point cloud and coder/decoder | |
| CN119366154A (en) | Point cloud encoding and decoding method, device, equipment and storage medium | |
| CN120476590A (en) | Point cloud encoding and decoding method, device, equipment and storage medium | |
| CN119366190A (en) | Point cloud encoding and decoding method, device, equipment and storage medium | |
| CN117354496A (en) | Point cloud encoding and decoding method, device, equipment and storage medium | |
| CN120303940A (en) | Point cloud encoding and decoding method, device, equipment and storage medium | |
| CN120188479A (en) | Point cloud encoding and decoding method, device, equipment and storage medium | |
| CN120435867A (en) | Point cloud encoding and decoding method, device, equipment and storage medium | |
| US20260039871A1 (en) | Point cloud encoding and decoding method and apparatus, device and storage medium | |
| US20260039870A1 (en) | Point cloud encoding method and apparatus, point cloud decoding method and apparatus, device, and storage medium | |
| CN120419182A (en) | Point cloud encoding and decoding method, device, equipment and storage medium | |
| US20260032286A1 (en) | Point cloud encoding/decoding method and apparatus, and device and storage medium | |
| CN120752918A (en) | Point cloud encoding and decoding method, device, equipment and storage medium | |
| CN120345252A (en) | Point cloud encoding and decoding method, device, equipment and storage medium | |
| CN119366185A (en) | Point cloud encoding and decoding method, device, equipment and storage medium | |
| CN116866615A (en) | Point cloud coding method and equipment | |
| CN119948873A (en) | Point cloud encoding and decoding method, device, equipment and storage medium | |
| CN119854515A (en) | Encoding and decoding methods, devices and electronic equipment | |
| WO2025007360A1 (en) | Coding method, decoding method, bit stream, coder, decoder, and storage medium | |
| HK40084295A (en) | Point cloud encoding and decoding method, apparatus, device, and storage medium | |
| CN121176016A (en) | Encoding and decoding methods, encoders, decoders, bitstreams, and storage media | |
| CN121002850A (en) | Encoding/decoding methods, bitstreams, encoders, decoders, and storage media | |
| CN120858572A (en) | Coding and decoding method, encoder, decoder, code stream and storage medium |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |