WO2018017599A1 - Système et procédé d'évaluation de qualité pour vidéo à 360 degrés - Google Patents
Système et procédé d'évaluation de qualité pour vidéo à 360 degrés Download PDFInfo
- Publication number
- WO2018017599A1 WO2018017599A1 PCT/US2017/042646 US2017042646W WO2018017599A1 WO 2018017599 A1 WO2018017599 A1 WO 2018017599A1 US 2017042646 W US2017042646 W US 2017042646W WO 2018017599 A1 WO2018017599 A1 WO 2018017599A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sample
- distortion
- weighted
- video
- area
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/154—Measured or subjectively estimated visual quality after decoding, e.g. measurement of distortion
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/597—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
- H04N19/107—Selection of coding mode or of prediction mode between spatial and temporal predictive coding, e.g. picture refresh
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/119—Adaptive subdivision aspects, e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
Definitions
- VR Virtual reality
- VR has many application areas: healthcare, education, social networking, industry design/training, game, movies, shopping, entertainment, etc. It is capable of bringing immersive experience for the user and is thus known as immersive multimedia. It creates an artificial environment, and the user feels that he is present in the environment. A user's experience is further improved by sensory and other interactions like posture, gesture, eye gaze, voice, etc.
- a VR system also may provide haptic feedback to the user.
- the user in some systems is presented with a 360-degree video, with a 360-degree viewing in horizontal direction and 180-degree viewing in vertical direction.
- a VR system uses multiple cameras system to capture the scene from different divergent views (e.g. 6-12 views). Those views are stitched together to form 360-degree video in high resolution (e.g. 4K or 8K).
- the current virtual reality system usually consists of a computation platform, head-mounted display (HMD), and head tracking sensors.
- the computation platform is in charge of receiving and decoding 360-degree video, and generating the viewport for display. Two pictures, one for each eye, are rendered for the viewport. The two pictures are displayed in the HMD for stereo viewing.
- a lens is used to magnify the image displayed in the HMD for better viewing.
- the head tracking sensor constantly keeps track of the viewer's head orientation and feeds the orientation information to the system to display the viewport picture for that orientation.
- Some VR systems may provide a specialized touch device for viewer to interact with objects in the virtual world.
- Gear VR is a light VR system, which uses smartphone as computation platform, HMD display and head tracking sensor.
- a second system is the HTC Vive system. Rift and Vive have similar performance.
- the spatial HMD resolution is 2160x1200
- refresh rate is 90Hz
- the field of view (FOV) is 110 degrees.
- the sampling rate for head tracking sensor is 1000Hz, which can capture very fast movement.
- Google also has a simple VR system called Cardboard. It consists of lenses and a cardboard frame. Like the Gear VR, it is driven by smartphone. In terms of 360-degree video streaming service, YouTube and Facebook are among the early providers.
- 360-degree video compression and delivery system Many companies are working on 360-degree video compression and delivery system, and they have their own solutions.
- Google YouTube provided a channel for 360-degree video streaming based on dynamic adaptive streaming for HTTP (DASH).
- Facebook also has solutions for 360-degree video delivery.
- One way to provide 360-degree video delivery is to represent the 360-degree information using a sphere geometry structure. For example, the synchronized multiple views captured by the multiple cameras are stitched on the sphere as one integral structure. Then the sphere information is projected to 2D planar surface with a given geometry conversion process.
- spherical video there is a mapping between a grid of samples and respective points on a unit sphere.
- distortion is measured at each sample of interest, and the distortion of each sample is weighted by the area on the unit sphere associated with the sample.
- a plurality of points on the unit sphere are selected, and the points are mapped to a nearest sample on the sample grid. Distortion is calculated at the nearest sample points and/or is weighted by a latitude-dependent weighting based on the latitude of the respective nearest sample point. The latitude-dependent weighting may be based on a viewing probability for that latitude.
- a method for generating an area-weighted spherical peak signal-to-noise ratio (AW-SPS R) for at least a selected portion of a coded spherical video, where the spherical video is associated with a mapping between regions on a unit sphere and samples in a grid.
- an area-weighted distortion is determined for each of a plurality of samples in the grid, wherein the area-weighted distortion is the unweighted distortion at the sample multiplied by the area of the region of the unit sphere associated with the respective sample.
- a sum of weighted distortions (SWD) is calculated by summing the determined area- weighted distortions.
- a sum of weights (SW) is calculated by summing the areas of the region of the unit sphere associated with the respective sample.
- a peak value P is determined from among the plurality of samples.
- the AW-SPSNR may be calculated as:
- AW-SPSNR(c) 10 1og( P 2 / (SWD/SW) ).
- a substantially spherical video is coded, with at least one coding-related decision being made using a rate-distortion metric determined based at least in part on the AW-SPSNR.
- FIGs. 1 A-1B provide a schematic illustration of sphere geometry projection to a 2D plane using an equirectangular projection.
- FIG. 1 A illustrates sphere sampling in longitude and latitude.
- FIG. IB illustrates a 2D plane with equirectangular projection. The point P on the sphere is projected to point q in the 2D plane.
- FIG. 2 illustrates uneven vertical sampling in 3D space with equal latitude intervals.
- FIG. 3 illustrates a sphere geometry representation with cubemap projection, PX (0), NX (1), PY (2), NY (3), PZ (4), NZ (5).
- FIG. 4 is a schematic illustration of comparison of a ground truth signal with coded panorama videos as described in M. Yu, H. Lakshman, B. Girod, "A Framework to Evaluate Omnidirectional Video Coding Schemes", IEEE International Symposium on Mixed and Augmented Reality, 2015.
- FIG. 5 is a schematic illustration of a sampling grid for 4:2:0 chroma format, where chroma is located at "D x " positions and luma is located at "U x " positions.
- FIG. 6 is a flow chart illustrating intermediate and end-to-end quality evaluation in SPSNR.
- FIG. 7 is a schematic illustration of a method for calculating SPSNR, particularly for use when the original video and the reconstructed video have the same projection format and resolution.
- FIG. 8 is a schematic illustration of an alternative method for calculating SPSNR, particularly for use when the original video and the reconstructed video have different projection formats and/or resolutions.
- FIG. 9 is a schematic illustration of an alternative method for calculating SPSNR, particularly when the original video and the reconstructed video have different projection formats and/or resolutions. Interpolation may be used in obtaining a sample value for the reconstructed video.
- FIG. 10 is a functional block diagram depicting an exemplary video encoder.
- FIG. 11 illustrates an exemplary wireless transmit/receive unit (WTRU) that may be employed as a video encoder and/or an apparatus for video quality evaluation in some embodiments.
- WTRU wireless transmit/receive unit
- FIG. 12 illustrates an exemplary network entity that may be employed as a video encoder and/or an apparatus for video quality evaluation in some embodiments.
- FIG. 1A illustrates sphere sampling using longitudes ( ⁇ ) and latitudes ( ⁇ ).
- FIG. IB illustrates the sphere being projected to a 2D plane using equirectangular projection.
- the longitude ⁇ in the range [- ⁇ , ⁇ ] is known as yaw
- latitude ⁇ in the range [- ⁇ /2, ⁇ /2] is known as pitch in aviation, where ⁇ is the ratio of a circle' s circumference to its diameter.
- the coordinates (x, y, z) are used to represent a point' s location in 3D space
- the coordinates (ue, ve) are used to represent a point' s location in a 2D plane.
- W and H are the width and height of the 2D planar picture.
- the point P the cross point between longitude L4 and latitude Al on the sphere
- the point q in the 2D plane can be projected back to the point P on the sphere via inverse projection.
- the field of view (FOV) in FIG. IB shows an example in which the FOV in the sphere is mapped to the 2D plane with the view angle along the X axis being about 1 10 degrees.
- EAP equal-area projection
- FIG. 3 shows an example of a projection for a cubemap.
- Spherical video may be presented as a panoramic video in different projection formats.
- the process of mapping sphere to the panoramic video leads to different sampling densities on the sphere.
- the sampling density approaches infinity toward the poles, and for cubemap, the sampling density is greater at the corner of the faces than at the centers of the cube faces. This scenario is different from a normal 2D video.
- different points on the sphere have different viewing probabilities. For example, the points at the equator are more likely to be viewed as compared to the points at the poles.
- M. Yu et al. have proposed different metrics which include Spherical Peak Signal- to-Noise Ratio (SPSNR), Weighted Spherical PSNR (W-SPSNR) and Latitude Spherical PSNR (L-SPSNR) in M. Yu, H. Lakshman, B. Girod, "A Framework to Evaluate Omnidirectional Video Coding Schemes," IEEE International Symposium on Mixed and Augmented Reality, 2015, which is incorporated herein by reference in its entirety. These may be described as follows.
- SPSNR Spherical Peak Signal- to-Noise Ratio
- W-SPSNR Weighted Spherical PSNR
- L-SPSNR Latitude Spherical PSNR
- Spherical PSNR In SPSNR, uniformly sampled set of points on a unit sphere are mapped onto two videos to be compared, respectively. Error at these mapped coordinates on the panoramic videos is computed.
- This metric is that panoramic videos in any projection formats can be compared in a fair manner. The metric overcomes the problem of different sampling densities in different projection formats.
- FIG. 4 depicts this concept. Here, a point s from a uniformly sampled set of points on unit sphere is mapped on to panorama videos, which are shown as q and r in the decoded videos Panol and Pano2 in FIG. 4, respectively.
- Panol is one projection format such as equal-area
- Pano2 is another projection format such as cubemap.
- These signals are compared against the ground truth sample g.
- To measure the distortion of Panol the difference between g and q is calculated.
- the loss in geometry conversion going from ERP to cubemap and back to ERP, the original ERP will be the ground truth and the reconstructed ERP will be equivalent to the decoded video in the FIG. 4.
- W-SPSNR Weighted Spherical PSNR
- L-SPSNR Latitude Spherical PSNR
- W- SPSNR the 3D map of weights used in W- SPSNR is marginalized along the longitudes so as to get the weights only along the latitude.
- latitude weights are used to weigh the errors based on their latitude positions.
- PSNR can be used as one objective quality metric for 2D planar picture quality evaluation.
- the sampling density in ERP is uneven.
- the sampling density approaches infinity in regions approaching the pole.
- the top portion of a picture corresponding to the North Pole and the bottom portion corresponding to the South Pole are "stretched," which indicates that the equirectangular sampling in the 2D spatial domain is uneven.
- SPSNR SPSNR, by using uniformly sampled points on a unit sphere for comparing 360 videos in ERP, many samples are neglected in the comparison. In particular, a lesser number of samples are considered for comparison towards the pole. It would thus be beneficial to come up with a new metric that takes all the samples into account and gives a meaningful PSNR.
- the error is weighted by the latitude weight of the new interpolated sample point in the sampling grid.
- the latitude weight may not correctly match the position of the interpolated nearest neighbor sample.
- a point in the uniformly sampled set of points on unit sphere is mapped to a projected 360 video.
- the sample value is derived with nearest neighbor method for interpolation.
- the weight is derived with the latitude weight of the position of the mapped point, instead of the position of the point on the sampling grid that is used to derive the sample value directly.
- prior error metrics do not satisfactorily account for calculation of the error of the chroma components.
- chroma 4:4:4 For the case of chroma 4:4:4, a similar approach as luma component can be used for chroma components. However, for the case of subsampled chroma components, the sampling grid of chroma and luma might be different. For example, for the 4:2:0 chroma format, a typical sampling grid relationship between luma and chroma is shown in FIG. 5, where chroma is located in the grid marked as "D x ", and luma is located in the grid marked as "U x ". As shown, the chroma sampling grid is aligned with the luma sampling grid in the horizontal direction, but has a 0.5-sample offset in the vertical direction. Proper care has to be taken in deriving the latitude weights for the chroma components by accounting for these offsets between luma and chroma components.
- exemplary quality evaluation methods for 360-degree video are disclosed.
- the present disclosure further provides a view-port based weighted average PSNR.
- overlapped view-ports are predefined to cover the sphere.
- the view-port is a 2D planar picture projected from sphere.
- a PSNR may then be calculated based on the view-port and its reference generated with the reference sphere.
- all PSNR values of view-ports are averaged with the weight determined based on viewing probability, for example, the weight is larger for those view-ports close to the equator.
- a weighting as described in M. Yu et al. is used.
- metrics disclosed herein are used in rate-distortion optimization when an ERP picture is coded.
- the distortion is evaluated with the sum of square error (SSE), which directly related to PSNR.
- SSE sum of square error
- the distortion calculation may be replaced by the weighted distortion calculation proposed in exemplary embodiments, which address the sampling issue in ERP. Specifically, given a coding unit with size NxM, the distortion of the block can be calculated as: M-l N-l
- Dist(CU) k ⁇ ⁇ (w n m ⁇ D n m ) ; n. m E CU
- w n m is the weight for the sample
- D n m is the distortion of the sample
- k is the normalization factor.
- the normalized factor k may also be evaluated globally to reduce the computation complexity.
- the modified rate-distortion optimization may be applied in the determination of various coding parameters (e.g. motion vector, mode, quantization parameter, tree structure) for use in processes such as motion estimation, inter or intra mode decision, quantization, and tree structure (quadtree, binary-tree) splitting decision.
- various coding parameters e.g. motion vector, mode, quantization parameter, tree structure
- processes such as motion estimation, inter or intra mode decision, quantization, and tree structure (quadtree, binary-tree) splitting decision.
- AW-SPSNR Area Weighted Spherical PSNR
- Exemplary embodiments disclosed herein make use of a metric referred to herein as Area Weighted Spherical PSNR (AW-SPSNR) to compare panoramic videos for 360-degree video quality evaluation.
- A-SPSNR Area Weighted Spherical PSNR
- Spherical PSNR compares two videos based on only on a subset of the available set of samples.
- embodiments disclosed herein operate to consider the entire set of samples available in the picture and weigh the errors according to the solid angle covered by the samples.
- Let (ue, ve) be the coordinate location of the pixel on ERP. Its corresponding latitude and longitude position can be obtained using the following formulas.
- ⁇ (ue/W - 0.5) ⁇ 2 ⁇ ( 3 )
- W and H are the width and height of ERP picture, respectively.
- W and H are the width and height of ERP picture, respectively.
- the area is dependent only on the latitude because ( ⁇ , ⁇ ) is evenly sampled. Thus for all the samples at a given latitude ⁇ , the error may be weighted by cos(0). [0041] For samples of infinitesimal size, the sum of all these weights would equal to the total surface area of the unit sphere, which is 4 ⁇ , and weights could be normalized using division by 4 ⁇ . The sum of weights for samples of finite size, however, is not exactly 4 ⁇ . Thus, to normalize the weights in some embodiments, the weights of each sample are summed, and the weights are normalized with the resulting actual sum.
- the distortion of samples on the sphere is measured by considering even sampling.
- (uei, vei) is the i-th sample position on a ERP picture; W, H are the width and height of the ERP picture, respectively;
- Ref(c, x, y) and I(c, x, y) are the reference picture and the picture to be evaluated, respectively, of component c, where c may be luma, Cb or Cr;
- SWD is the sum of weighted distortion; and SW is the sum of weight.
- the distortion (squared error) Di is calculated for the i-th sample point:
- the latitude of the position of the i-th sample point is calculated:
- the distortion is weighted with the obtained latitude weight.
- the sum of weighted distortion (SWD) and the sum of weights (SW) are incremented by the weighted distortion (wDi) and by the current weight cos(0i), respectively.
- the AW-SPSNR of component c may be calculated as follows, where P is the peak value of the sample value.
- AW-SPSNR(c) 10 1og( P 2 / (SWD/SW) )
- AW-SPSNR is represented in decibels in the equation above. It should be understood that in other embodiments, AW-SPSNR is represented using a measure other than decibels.
- AW-SPNSR may be calculated over any desired subset of samples in the projection picture. For example, it may be calculated over the full set of samples in an encoded/decoded ERP representation (to evaluate overall distortion of the encoded representation), or it may be calculated using only the set of samples in a particular coding unit (CU) or prediction unit (PU), or some other subset of samples relevant to a particular coding decision (e.g., to support the making of rate- distortion optimized decisions within a video encoder).
- CU coding unit
- PU prediction unit
- some other subset of samples relevant to a particular coding decision e.g., to support the making of rate- distortion optimized decisions within a video encoder.
- the area calculation in Equation (5) is changed according to projection format, so the weight may be different from that in ERP.
- Embodiments disclosed herein accommodate different weight calculations for different projection formats. Exemplary embodiments proceed to determine weights as follows. Consider a mapping in which (xg, yg) are the coordinates of a point in a given geometry space and ( ⁇ , ⁇ ) are the corresponding latitude and longitude position of this sample on the unit sphere.
- the geometry mapping may be expressed as functions / and g, where / and g are different for different types of projection.
- the functions / and g satisfy the following relationship.
- e f(x9.ya) (6)
- ⁇ g(xg, yg) )
- ) are computed. Since ⁇ and ⁇ are functions of both xg and yg, the partial derivatives are first computed and then the total derivatives are computed. Let ⁇ /dxg and ⁇ /dyg be the partial derivatives of ⁇ with regard to xg and yg, respectively. Similarly, let dtydxg and dtydyg be the partial derivatives of ⁇ with regard to xg and yg, respectively. The computation of d0 and d ⁇ may then be as follows:
- the area may then be computed using equation (5).
- a W-SPSNR Weights for EAP may then be computed using equation (5).
- AW-SPSNR is determined for systems using equal-area projection.
- the geometry mapping between the pixel coordinates and the position on the unit sphere is as follows:
- Cubemap has six symmetric faces. The weights derived for one face can be used for all the six faces. The calculation is done for the face ABCD in FIG. 3.
- ⁇ is a function of both x and y.
- the partial derivatives ⁇ /dx and ⁇ /dy are computed and then the total derivative.
- the partial derivatives are:
- the Euclidian norm may be calculated assuming dx and dy to be equal as:
- the area of the sample may be determined as follows in Eq. (24), and this area may be used as the weight of the sample in embodiments disclosed herein:
- samples from the set of uniformly sampled set of points on the unit sphere are mapped onto the projection geometry. If the mapped point is not on the integer sampling grid, various interpolation filters, including bilinear and bicubic filters, are used to obtain the pixel value at the mapped positions.
- the ultimate goal in quality comparison should be the comparison of the samples on the sampling grid. Thus, it is desirable to use nearest neighbor interpolation in quality comparison.
- SPSNR may be used for 360 video quality evaluation in different ways.
- SPSNR is applied in two ways: intermediate and end to end quality evaluation, as shown in FIG. 6.
- end to end quality evaluation the reference for SPSNR calculation is the original video, and the test is reconstructed video after decoding and projection format conversion.
- the two inputs to SPSNR calculation that is, the reference signal and the test signal, have the same projection format and resolution.
- the reference is still the original high resolution video, but the test is the reconstructed video right after decoding but before projection format conversion. Since the inverse projection format conversion has not yet been applied, the test signal may have a different projection format and/or resolution compared to that of reference video.
- the intermediate quality evaluation is also referred to as "cross format" quality evaluation.
- the point S from the set of points uniformly sampled on the sphere is mapped to the point g in the original video and the point q in the reconstructed video Panol .
- the error between the sample value at point g and the sample value at point q is calculated for the quality evaluation between original video and reconstructed video.
- the sample value at point g or q may be derived with the sample at its nearest integer sampling point if the point g or q is not at an integer sampling position to avoid introducing additional interpolation error in quality evaluation.
- g' and q' are the nearest integer sampling point of g and q, respectively.
- the sample error calculation process is summarized as following steps, illustrated in FIG. 7.
- a sample point is S is selected from a set of points that is substantially uniformly sampled on a sphere.
- step 701 map a point S (S is from the set of points uniformly sampled on a sphere) to point g in the original video 710 in the projection plane.
- step 702 round point g to nearest neighbor point g' at an integer sampling position.
- step 703 map point S from the set of points uniformly sampled on the sphere to point q in the reconstructed video 712 in the projection plane.
- step 704 round point q to nearest neighbor point q' at an integer sampling position; and subsequently calculate the error between the sample value at point g' and the sample value at point q'-
- coordinate mapping from the sphere to the projection plane can be performed in the reverse direction, that is, the points g' and q' on the projection plane can be mapped back to S(g') and S(q') on the sphere. If the original video and the reconstructed video have the same projection format and the same resolution, S(g') and S(q') will be mapped back at the same position on sphere (as indicated by S' in FIG. 7). However, if the original video and the reconstructed video have either different projection formats or different resolutions or both different resolution and different projection formats, after the inverse coordinate mapping, S(g') and S(q') may correspond to different positions on the sphere (not shown in FIG.
- exemplary sample error calculation methods for SPSNR with nearest neighbor are proposed as follows. Such embodiments may be used when, for example, the original video and the reconstructed video are not in the same projection format or do not have the same resolution.
- An exemplary error calculation method is illustrated in FIG. 8. A method as illustrated in FIG. 8 may operate to minimize the distance of the two points between which the sample error is calculated, in order to reduce the inaccuracy caused by the non-aligned spherical coordinates due to nearest neighbor rounding in the projection plane.
- an exemplary SPSNR method may be performed as follows.
- a sample point S is selected from a set of points that is substantially uniformly sampled on a sphere.
- step 801 map point S to point q in the reconstructed video 812.
- step 802 round point q to nearest neighbor point q' at an integer sampling position in the reconstructed video domain.
- step 803 perform inverse coordinate mapping of the point q' back onto the sphere at S(q').
- step 804 perform coordinate mapping from the spherical coordinate S(q') to the original video projection domain 810 at the position g.
- step 805 round point g to nearest neighbor point g' at an integer sampling position, and subsequently calculate the error between the sample value at point g' in the original video and the sample value at point q' in the reconstructed video.
- such a method may be performed for each of a plurality of points S selected on the sphere, with squared errors for each point being summed to generate a distortion metric.
- the calculated errors are weighted in the summation process, using techniques disclosed herein or other weighting schemes.
- a method of error calculation may be performed as follows, in which the roles of the original video and reconstructed video are reversed as compared to the above steps.
- a sample point S is selected from a set of points that is substantially uniformly sampled on a sphere (analogous to step 800).
- Point S is mapped to point q in the original video (analogous to step 801, except that video 812 now represents the original video).
- Point q is rounded to nearest neighbor point q' at an integer sampling position in the original video domain (analogous to step 802, except in the original video).
- Inverse coordinate mapping of the point q' back onto the sphere at S(q') is performed (analogous to step 803).
- Coordinate mapping is performed from the spherical coordinate S(q') to the reconstructed video projection domain at the position g (analogous to step 804, except that video 810 now represents the reconstructed video).
- Point g is rounded to the nearest neighbor point g' at an integer sampling position (analogous to step 805), and the error is calculated between the sample value at point g' in the reconstructed video and the sample value at point q' in the original video.
- the sample error is calculated between the points q' and g'.
- the distance between spherical point S(q') and spherical point S(g') on sphere is measured with the distance between g (which maps to the coordinate q' in the reconstructed video projection domain) and g' in the original/reconstructed video projection domain.
- Examples of coordinate points such as may be used in the method of FIG. 7 are also shown in FIG. 8 as go (directly mapped from the spherical coordinate S) and go' (rounded coordinate based on go). The method of FIG. 7 effectively measures the sample error between the points q' and go'.
- the distance between spherical point S(q') and spherical point S(go') on sphere is measured with the distance between g (which maps to the coordinate q' in the reconstructed video projection domain) and go' in the original/reconstructed video projection domain.
- g which maps to the coordinate q' in the reconstructed video projection domain
- a method according to FIG. 8 can operate to minimize the distance between the two points between which the sample error is calculated.
- SPS R may be calculated without necessarily considering the resolutions of the original and reconstructed video in their projection domain(s). In some embodiments, however, effects due to the original video and the reconstructed video having different resolutions are taken into consideration. In some such embodiments, the original and the reconstructed video may have the same projection format.
- the video having the lower resolution is selected (from between the original video and the reconstructed video). A point S on the sphere is mapped to point q in the selected lower-resolution video, as depicted in step 801 of FIG. 8.
- the rounding of coordinate g to g' in step 805 is thus performed in the higher-resolution video. Performing the final rounding of the coordinate g in the higher resolution video incurs a smaller rounding error, since the sampling grid in the higher resolution video is denser and more accurate.
- a method as described with respect to FIG. 8 may be applied in a case where the original and the reconstructed video also have different projection formats.
- step 801 is performed on the video (original or reconstructed) with the lower resolution, and the rounding of step 805 is performed in the video with higher resolution.
- higher resolution may not always translate to denser sampling in all areas on the sphere.
- higher resolution generally represents a denser and more accurate sampling grid, and may be better suited for final rounding in step 805.
- the methods described with respect to FIG. 8 perform additional steps of 2D-to-3D and 3D-to-2D coordinate mapping. This may lead to increased computational complexity.
- the coordinate mapping can be pre-calculated and stored in a lookup table, and re-used on a frame-by-frame basis to reduce computation complexity.
- methods as described with respect to FIG. 8 may be employed in situations where the original video and reconstructed video have either different projection format or different resolution. Such methods may further be implemented in cases where the luma and chroma components have different resolutions, such as 4:2:0 chroma format. In such instances, a method as described with respect to FIG. 8 may be applied for each component's quality evaluation separately. For example, a method may be applied with luma component resolution and sampling grid to perform luma quality evaluation, and the method may be separately applied with chroma component resolution and sampling grid to perform chroma quality evaluation.
- the sphere point S(g') corresponding to g' in the original video and the sphere point S(q') corresponding to q' in the reconstructed video may not be located at the same position on the sphere. This is the result of the rounding operation in step 805.
- the point S(q') on the sphere is likely to be different from S(g') when the projection format of reconstructed video and/or the resolution of the reconstruction video is different from that of the original video.
- the position misalignment on the sphere between reference sample and reconstructed sample is addressed using interpolation in the reconstructed video.
- FIG. 9 includes the following steps.
- a sample point S is selected from a set of points that is substantially uniformly sampled on a sphere.
- step 901 map point S (where S is from the set of points uniformly sampled on sphere) to point g in the original video 910.
- step 902 round point g to nearest neighbor point g' at an integer sampling position in the original video domain.
- step 903 perform inverse coordinate mapping of the point g' back onto the sphere at S(g').
- step 904 perform coordinate mapping from the spherical coordinate S(g') to the reconstructed video projection domain 912 at the position q. Apply interpolation to derive the sample value at point q using its neighboring sample values at integer sampling positions if q is not located at an integer sampling position. No interpolation is applied if q is at an integer sampling position. The error between the sample value at point g' in the original video and the interpolated sample value at point q in the reconstructed video is then calculated.
- An SPSNR calculation using a technique described with respect to FIG. 4 applies the interpolation filtering to both the original signal and reconstructed video.
- the interpolation filter may change the characteristics of original signal and thus will affect the PSNR calculation.
- the method described with respect to FIG. 7 does not apply interpolation filtering, and it can be used when the original video and reconstructed have the same projection format and resolution.
- the sample position misalignment on the sphere between the reference sample and the reconstructed sample degrades the SPSNR calculation when the original video and reconstructed video do not have the same projection format and resolution.
- the misalignment also makes SPSNR comparison unreliable when comparing different projection formats or the same projection format with different resolution.
- the method described with respect to FIG. 9 addresses the problem of sample position misalignment by applying the interpolation filtering to the reconstructed video as appropriate.
- the method of FIG. 9 does not apply the interpolation filtering to the original signal, so it will not change the characteristics of the original signal for PSNR calculation.
- a method such as that illustrated with respect to FIG. 9 may be used in calculating an SPSNR that is appropriate for cross-format quality comparison.
- the following method may be used in weighted distortion computation for L-SPSNR calculation for luma component or chroma components in 4:4:4 chroma format.
- a set of substantially uniformly distributed points is selected on the unit sphere.
- the number of points selected may vary in different embodiments. Any one of a variety of techniques may be used to select substantially uniform points on the unit sphere, including random selection or selection using algorithmic techniques such as spiral point selection or charged particle simulation.
- (ui, vi) is i-th sample point position of the uniformly sampled set of points on the unit sphere; (uei, vei) is a mapped position on an ERP picture; W, H are the width and height of the ERP picture, respectively; Ref(c, x, y) and I(c, x, y) are the reference picture and the picture to be evaluated, respectively, of component c, where c may be, e.g., luma, Cb or Cr; W(0) is the weight function to define the weight at the latitude ⁇ .
- SWD is the sum of weighted distortion, and SW is the sum of weights.
- the nearest neighbor (uni, vni) of the mapped point is found on the sampling grid:
- the distortion (squared error) Di is calculated at this nearest neighbor sample point:
- the latitude weight of the new latitude is obtained.
- interpolation may be applied to derive the weight at the input latitude ⁇ .
- the L- SPSNR of component c is calculated as follows. P is the peak value of the sample value.
- L-SPSNR(c) 10 1og( P 2 / (SWD/SW) )
- the distortion and the weight are aligned at the same position on the sphere.
- the calculated value of L-SPNSR may more accurately reflect video quality.
- L-SPSNR is represented in decibels in the equation above. It should be understood that in other embodiments, L-SPSNR is represented using a measure other than decibels.
- Type 0 is most widely used for 4:2:0 chroma format. It has a misalignment of 0.5 samples in the vertical direction.
- Exemplary steps that may be used in calculating L-SPSNR for chroma components include the following.
- a point i is selected from the uniformly sampled set of points on the unit sphere and is mapped onto the ERP picture as follows.
- the nearest neighbor position of the mapped point is found on the chroma sampling grid
- the distortion (squared error) is calculated at this nearest neighbor sample point for component c (either Cb or Cr);
- the position of the nearest neighbor is found on the luma grid.
- vni_c vni_l + offset_y
- the latitude weight of the new latitude is found.
- the distortion is weighted with the obtained latitude weight, and the weighted distortion and the weight are accumulated.
- the L-SPSNR of component c is calculated as follows. P is the peak value of the sample value.
- L-SPSNR(c) 10 1og( P 2 / (SWD/SW) ) [0101] It may be noted that the value of L-SPS R is represented in decibels in the equation above. It should be understood that in other embodiments, L-SPSNR is represented using a measure other than decibels.
- weight values may be generated in a variety of ways, including custom weights, for different weighted SPNSR.
- weights are derived across a wide variety of sequences, and these weights are fixed and applied to new sequences.
- pre-trained weights may be used for rate-distortion optimization decisions in a newly encoded sequence, even if that sequence does not contribute to the training of the weights.
- the weighted SPSNR calculation method described above can also be applied in the determination of coding parameters that are based on rate-distortion (R-D) optimization (e.g. motion vector, mode, quantization parameter, tree structure) determination.
- R-D rate-distortion
- decisions may be used in, for example, motion estimation, inter or intra mode decisions, quantization and tree (quadtree, binary-tree) structure splitting decisions.
- S-PSNR and/or L- PSNR measures may be used in a R-D-based decision process.
- a block being coded is mapped to a corresponding region on the unit sphere.
- S-PSNR and/or L-PSNR calculation instead of using sample points from across the entire unit sphere, only points within that corresponding region on the unit sphere are used for S-PSNR and/or L-PSNR calculation.
- the resulting S-PSNR and/or L-PSNR measure may be used for R-D optimization.
- Embodiments disclosed herein like the HEVC and JEM software, are built upon a block- based hybrid video coding framework.
- FIG. 10 is a functional block diagram of a block-based hybrid video encoding system.
- the input video signal 1002 is processed block by block.
- extended block sizes called a
- coding unit or CU
- a CU can be up to 64x64 pixels, and bigger block size up to 256x256 is allowed in JEM.
- a CU can be further partitioned into prediction units (PU), for which separate prediction methods are applied.
- PU prediction units
- Spatial prediction (or “intra prediction”) uses pixels from the already coded neighboring blocks in the same video picture/slice to predict the current video block. Spatial prediction reduces spatial redundancy inherent in the video signal.
- Temporal prediction (also referred to as “inter prediction” or “motion compensated prediction”) uses reconstructed pixels from the already coded video pictures to predict the current video block. Temporal prediction reduces temporal redundancy inherent in the video signal. Temporal prediction signal for a given video block is usually signaled by one or more motion vectors which indicate the amount and the direction of motion between the current block and its reference block. Also, if multiple reference pictures are supported (as is the case for the recent video coding standards such as H.264/AVC or HEVC), then for each video block, its reference picture index is sent additionally; and the reference index is used to identify from which reference picture in the reference picture store (1064) the temporal prediction signal comes.
- inter prediction motion compensated prediction
- the mode decision block (1080) in the encoder chooses the best prediction mode.
- the best prediction mode is selected using a rate-distortion optimization method in which distortion is measured using one or more of the techniques described herein, such as Area Weighted PSNR.
- the prediction block is then subtracted from the current video block (1016); and the prediction residual is de-correlated using transform (1004) and quantized (1006) to achieve the target bit-rate.
- the quantized residual coefficients are inverse quantized (1010) and inverse transformed (1012) to form the reconstructed residual, which is then added back to the prediction block (1026) to form the reconstructed video block.
- in-loop filtering such as de-blocking filter and Adaptive Loop Filters may be applied (1066) on the reconstructed video block before it is put in the reference picture store (1064) and used to code future video blocks.
- coding mode inter or intra
- prediction mode information prediction mode information
- motion information motion information
- quantized residual coefficients are all sent to the entropy coding unit (1008) to be further compressed and packed to form the bit-stream.
- modules that carry out (i.e., perform, execute, and the like) various functions that are described herein in connection with the respective modules.
- a module includes hardware (e.g., one or more processors, one or more microprocessors, one or more microcontrollers, one or more microchips, one or more application-specific integrated circuits (ASICs), one or more field programmable gate arrays (FPGAs), one or more memory devices) deemed suitable by those of skill in the relevant art for a given implementation.
- ASICs application-specific integrated circuits
- FPGAs field programmable gate arrays
- Each described module may also include instructions executable for carrying out the one or more functions described as being carried out by the respective module, and it is noted that those instructions could take the form of or include hardware (i.e., hardwired) instructions, firmware instructions, software instructions, and/or the like, and may be stored in any suitable non-transitory computer- readable medium or media, such as commonly referred to as RAM, ROM, etc.
- Exemplary embodiments disclosed herein are implemented using one or more wired and/or wireless network nodes, such as a wireless transmit/receive unit (WTRU) or other network entity.
- WTRU wireless transmit/receive unit
- FIG. 11 is a system diagram of an exemplary WTRU 1102, which may be employed as a video encoder and/or an apparatus for video quality evaluation in embodiments described herein.
- the WTRU 1102 may include a processor 1118, a communication interface 1119 including a transceiver 1120, a transmit/receive element 1122, a speaker/microphone 1124, a keypad 1126, a display/touchpad 1128, a non-removable memory 1130, a removable memory 1132, a power source 1134, a global positioning system (GPS) chipset 1136, and sensors 1138.
- GPS global positioning system
- the processor 1118 may be a general purpose processor, a special purpose processor, a conventional processor, a digital signal processor (DSP), a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Array (FPGAs) circuits, any other type of integrated circuit (IC), a state machine, and the like.
- the processor 1118 may perform signal coding, data processing, power control, input/output processing, and/or any other functionality that enables the WTRU 1102 to operate in a wireless environment.
- the processor 1118 may be coupled to the transceiver 1120, which may be coupled to the transmit/receive element 1122. While FIG. 11 depicts the processor 1118 and the transceiver 1120 as separate components, it will be appreciated that the processor 1118 and the transceiver 1120 may be integrated together in an electronic package or chip.
- the transmit/receive element 1122 may be configured to transmit signals to, or receive signals from, a base station over the air interface 1116.
- the transmit/receive element 1122 may be an antenna configured to transmit and/or receive RF signals.
- the transmit/receive element 1122 may be an emitter/detector configured to transmit and/or receive IR, UV, or visible light signals, as examples.
- the transmit/receive element 1122 may be configured to transmit and receive both RF and light signals. It will be appreciated that the transmit/receive element 1122 may be configured to transmit and/or receive any combination of wireless signals.
- the WTRU 1102 may include any number of transmit/receive elements 1122. More specifically, the WTRU 1102 may employ MTMO technology. Thus, in one embodiment, the
- the WTRU 1102 may include two or more transmit/receive elements 1122 (e.g., multiple antennas) for transmitting and receiving wireless signals over the air interface 1116.
- the transceiver 1120 may be configured to modulate the signals that are to be transmitted by the transmit/receive element 1122 and to demodulate the signals that are received by the transmit/receive element 1122.
- the WTRU 1102 may have multi-mode capabilities.
- the transceiver 1120 may include multiple transceivers for enabling the WTRU 1102 to communicate via multiple RATs, such as UTRA and IEEE 802.11, as examples.
- the processor 1118 of the WTRU 1102 may be coupled to, and may receive user input data from, the speaker/microphone 1124, the keypad 1126, and/or the display/touchpad 1128 (e.g., a liquid crystal display (LCD) display unit or organic light-emitting diode (OLED) display unit).
- the processor 1118 may also output user data to the speaker/microphone 1124, the keypad 1126, and/or the display/touchpad 1128.
- the processor 1118 may access information from, and store data in, any type of suitable memory, such as the non-removable memory 1130 and/or the removable memory 1132.
- the non-removable memory 1130 may include random-access memory (RAM), read-only memory (ROM), a hard disk, or any other type of memory storage device.
- the removable memory 1132 may include a subscriber identity module (SEVI) card, a memory stick, a secure digital (SD) memory card, and the like.
- SEVI subscriber identity module
- SD secure digital
- the processor 1118 may access information from, and store data in, memory that is not physically located on the WTRU 1102, such as on a server or a home computer (not shown).
- the processor 1118 may receive power from the power source 1134, and may be configured to distribute and/or control the power to the other components in the WTRU 1102.
- the power source 1134 may be any suitable device for powering the WTRU 1102.
- the power source 1134 may include one or more dry cell batteries (e.g., nickel -cadmium (NiCd), nickel-zinc (NiZn), nickel metal hydride (NiMH), lithium-ion (Li-ion), and the like), solar cells, fuel cells, and the like.
- the processor 1118 may also be coupled to the GPS chipset 1136, which may be configured to provide location information (e.g., longitude and latitude) regarding the current location of the WTRU 1102.
- location information e.g., longitude and latitude
- the WTRU 1102 may receive location information over the air interface 1116 from a base station and/or determine its location based on the timing of the signals being received from two or more nearby base stations. It will be appreciated that the WTRU 1102 may acquire location information by way of any suitable location-determination method while remaining consistent with an embodiment.
- the processor 1118 may further be coupled to other peripherals 1138, which may include one or more software and/or hardware modules that provide additional features, functionality and/or wired or wireless connectivity.
- the peripherals 1138 may include sensors such as an accelerometer, an e-compass, a satellite transceiver, a digital camera (for photographs or video), a universal serial bus (USB) port, a vibration device, a television transceiver, a hands free headset, a Bluetooth® module, a frequency modulated (FM) radio unit, a digital music player, a media player, a video game player module, an Internet browser, and the like.
- sensors such as an accelerometer, an e-compass, a satellite transceiver, a digital camera (for photographs or video), a universal serial bus (USB) port, a vibration device, a television transceiver, a hands free headset, a Bluetooth® module, a frequency modulated (FM) radio unit, a digital music player, a media player, a video game player module
- FIG. 12 depicts an exemplary network entity 1290 that may be used in embodiments of the present disclosure, for example as a video encoder and/or an apparatus for video quality evaluation.
- network entity 1290 includes a communication interface 1292, a processor 1294, and non-transitory data storage 1296, all of which are communicatively linked by a bus, network, or other communication path 1298.
- Communication interface 1292 may include one or more wired communication interfaces and/or one or more wireless-communication interfaces. With respect to wired communication, communication interface 1292 may include one or more interfaces such as Ethernet interfaces, as an example. With respect to wireless communication, communication interface 1292 may include components such as one or more antennae, one or more transceivers/chipsets designed and configured for one or more types of wireless (e.g., LTE) communication, and/or any other components deemed suitable by those of skill in the relevant art. And further with respect to wireless communication, communication interface 1292 may be equipped at a scale and with a configuration appropriate for acting on the network side— as opposed to the client side— of wireless communications (e.g., LTE communications, Wi-Fi communications, and the like). Thus, communication interface 1292 may include the appropriate equipment and circuitry (perhaps including multiple transceivers) for serving multiple mobile stations, UEs, or other access terminals in a coverage area.
- wireless communication interface 1292 may include the appropriate equipment and circuitry (perhaps including multiple transceivers)
- Processor 1294 may include one or more processors of any type deemed suitable by those of skill in the relevant art, some examples including a general-purpose microprocessor and a dedicated DSP.
- Data storage 1296 may take the form of any non-transitory computer-readable medium or combination of such media, some examples including flash memory, read-only memory (ROM), and random-access memory (RAM) to name but a few, as any one or more types of non-transitory data storage deemed suitable by those of skill in the relevant art could be used. As depicted in FIG. 12, data storage 1296 contains program instructions 1297 executable by processor 1294 for carrying out various combinations of the various network-entity functions described herein.
- ROM read only memory
- RAM random access memory
- register cache memory
- semiconductor memory devices magnetic media such as internal hard disks and removable disks, magneto-optical media, and optical media such as CD- ROM disks, and digital versatile disks (DVDs).
- a processor in association with software may be used to implement a radio frequency transceiver for use in a WTRU, UE, terminal, base station, RNC, or any host computer.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
L'invention concerne des systèmes et des procédés permettant de déterminer une métrique de distorsion pour le codage d'une vidéo sphérique. Dans la vidéo sphérique, on effectue une mise en correspondance entre une géométrie d'échantillons donnée et des points respectifs présents sur une sphère unitaire. Dans certains modes de réalisation, la distorsion est mesurée au niveau de chaque échantillon intéressant, et la distorsion de chaque échantillon est pondérée par la superficie sur la sphère unitaire associée à l'échantillon. Dans certains modes de réalisation, une pluralité de points de la sphère unitaire est sélectionnée, et les points sont mis en correspondance avec un échantillon le plus proche sur la géométrie donnée. La distorsion est calculée au niveau des points échantillons les plus proches et est pondérée par une pondération dépendant de la latitude, basée sur la latitude du point échantillon respectif le plus proche. La pondération dépendant de la latitude peut être basée sur une probabilité de visualisation associée à cette latitude.
Applications Claiming Priority (8)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201662364197P | 2016-07-19 | 2016-07-19 | |
| US62/364,197 | 2016-07-19 | ||
| US201662367404P | 2016-07-27 | 2016-07-27 | |
| US62/367,404 | 2016-07-27 | ||
| US201762454547P | 2017-02-03 | 2017-02-03 | |
| US62/454,547 | 2017-02-03 | ||
| US201762466712P | 2017-03-03 | 2017-03-03 | |
| US62/466,712 | 2017-03-03 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2018017599A1 true WO2018017599A1 (fr) | 2018-01-25 |
Family
ID=59626662
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2017/042646 Ceased WO2018017599A1 (fr) | 2016-07-19 | 2017-07-18 | Système et procédé d'évaluation de qualité pour vidéo à 360 degrés |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2018017599A1 (fr) |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2018175215A1 (fr) * | 2017-03-23 | 2018-09-27 | Qualcomm Incorporated | Paramètres adaptatifs pour le codage d'une vidéo à 360 degrés |
| WO2018200293A1 (fr) * | 2017-04-28 | 2018-11-01 | Microsoft Technology Licensing, Llc | Codage d'images |
| CN112435218A (zh) * | 2020-11-04 | 2021-03-02 | 南京火眼锐视信息科技有限公司 | 一种文档图像的形变度评估、筛选方法和装置 |
| CN114972267A (zh) * | 2022-05-31 | 2022-08-30 | 腾讯音乐娱乐科技(深圳)有限公司 | 全景视频评价方法、计算机设备和计算机程序产品 |
-
2017
- 2017-07-18 WO PCT/US2017/042646 patent/WO2018017599A1/fr not_active Ceased
Non-Patent Citations (3)
| Title |
|---|
| M. YU; H. LAKSHMAN; B. GIROD: "A Framework to Evaluate Omnidirectional Video Coding Schemes", IEEE INTERNATIONAL SYMPOSIUM ON MIXED AND AUGMENTED REALITY, 2015 |
| VISHWANATH B ET AL: "AHG8: Area Weighted Spherical PSNR for 360 video quality evaluation", 4. JVET MEETING; 15-10-2016 - 21-10-2016; CHENGDU; (THE JOINT VIDEO EXPLORATION TEAM OF ISO/IEC JTC1/SC29/WG11 AND ITU-T SG.16 ); URL: HTTP://PHENIX.INT-EVRY.FR/JVET/,, no. JVET-D0072, 6 October 2016 (2016-10-06), XP030150305 * |
| YULE SUN ET AL: "[FTV-AHG] WS-PSNR for 360 video quality evaluation", 115. MPEG MEETING; 30-5-2016 - 3-6-2016; GENEVA; (MOTION PICTURE EXPERT GROUP OR ISO/IEC JTC1/SC29/WG11),, no. m38551, 27 May 2016 (2016-05-27), XP030066907 * |
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2018175215A1 (fr) * | 2017-03-23 | 2018-09-27 | Qualcomm Incorporated | Paramètres adaptatifs pour le codage d'une vidéo à 360 degrés |
| US10904531B2 (en) | 2017-03-23 | 2021-01-26 | Qualcomm Incorporated | Adaptive parameters for coding of 360-degree video |
| WO2018200293A1 (fr) * | 2017-04-28 | 2018-11-01 | Microsoft Technology Licensing, Llc | Codage d'images |
| CN112435218A (zh) * | 2020-11-04 | 2021-03-02 | 南京火眼锐视信息科技有限公司 | 一种文档图像的形变度评估、筛选方法和装置 |
| CN114972267A (zh) * | 2022-05-31 | 2022-08-30 | 腾讯音乐娱乐科技(深圳)有限公司 | 全景视频评价方法、计算机设备和计算机程序产品 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11876981B2 (en) | Method and system for signaling of 360-degree video information | |
| US20220368947A1 (en) | 360-degree video coding using geometry projection | |
| US12445646B2 (en) | 360-degree video coding using face continuities | |
| US20230412839A1 (en) | Geometry Conversion for 360-degree Video Coding | |
| US20210337202A1 (en) | Adaptive quantization method for 360-degree video coding | |
| TW201840181A (zh) | 用於360度視訊之有效壓縮之球極投影 | |
| WO2018017599A1 (fr) | Système et procédé d'évaluation de qualité pour vidéo à 360 degrés | |
| Dziembowski et al. | Virtual view synthesis for 3DoF+ video | |
| KR102882879B1 (ko) | 지오메트리 투영을 이용한 360도 비디오 코딩 | |
| WO2019008233A1 (fr) | Méthode et appareil d'encodage de contenu multimédia | |
| JP2023512668A (ja) | 複数の測定値を用いた3d点群の増強 | |
| WO2018170416A1 (fr) | Conversion de virgule flottante en nombre entier pour conversion de format de projection de vidéo à 360 degrés et calcul de métriques sphériques |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17752199 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 17752199 Country of ref document: EP Kind code of ref document: A1 |