WO2018134128A1

WO2018134128A1 - Filtering of video data using a shared look-up table

Info

Publication number: WO2018134128A1
Application number: PCT/EP2018/050723
Authority: WO
Inventors: Kenneth Andersson; Per Wennersten; Jacob STRÖM; Jack ENHORN
Original assignee: Telefonaktiebolaget LM Ericsson AB
Current assignee: Telefonaktiebolaget LM Ericsson AB
Priority date: 2017-01-19
Filing date: 2018-01-12
Publication date: 2018-07-26
Anticipated expiration: 2019-07-19

Abstract

A method, performed by a filter, for filtering of a picture of a video signal, wherein the picture comprises pixels, each pixel being associated with a pixel value, wherein a pixel value is modified by a weighted combination of the pixel value and at least one spatially neighboring pixel value, and wherein the filtering is controlled by first and second parameters, σ _d and σ _r, wherein σ _d depends on a pixel distance between the pixel value and the neighboring pixel value, wherein σ _r depends on a pixel value difference between the pixel value and the neighboring pixel value. The method comprises storing weighting coefficients for use by the filter in modifying pixel values, the weighting coefficients being stored in one or more look up tables, LUTs; sharing at least one LUT (LUT_SHARED) for storing weighting coefficients for use by either an intra decoding operation or a inter decoding operation, by: storing weighting coefficients relating to one of the intra decoding operation or inter decoding operation in the at least one shared LUT (LUT_SHARED); and deriving approximated weighting coefficients for the other of the intra decoding operation or inter decoding operation using an approximation function to modify the weighting coefficients stored in the at least one shared LUT (LUT_SHARED).

Description

FILTERING OF VIDEO DATA USING A SHARED LOOK-UP TABLE

Technical Field

The present embodiments generally relate to filter apparatus and methods, for example to apparatus and methods for video coding and decoding, and in particular to deringing filtering in video coding and decoding.

Background

The latest video coding standard, H.266, also known as High Efficiency Video Coding (HEVC), is a block based video codec, developed by the Joint Collaborative Team on Video Coding (JCT-VC). It utilizes both temporal and spatial prediction. Spatial prediction is achieved using intra (I) prediction from within the current picture. A picture consisting of only intra coded blocks is referred to as an l-picture. Temporal prediction is achieved using inter (P) or bi-directional inter (B) prediction on block level. HEVC was finalized in 2013.

International Telecommunication Union (ITU) Telecommunication Standardization Sector (ITU-T) Video Coding Experts Group (VCEG) and International Organization for Standardization (ISO)/lnternational Electrotechnical Commission (IEC) Moving Picture Experts Group (MPEG) are studying the potential need for standardization of future video coding technology with a compression capability that significantly exceeds that of the current HEVC standard. Such future standardization action could either take the form of additional extension(s) of HEVC or an entirely new standard. The groups are working together on this exploration activity in a joint collaboration effort known as the Joint Video Exploration Team (JVET) to evaluate compression technology designs proposed by their experts in this area.

Ringing, also referred to as Gibbs phenomenon, appears in video frames as oscillations near sharp edges. It is a result of a cut-off of high-frequency information in the block Discrete Cosine Transform (DCT) transformation and lossy quantization process. Ringing also comes from inter prediction where sub-pixel interpolation using a filter with negative weights can cause ringing near sharp edges. Artificial patterns that resemble ringing can also appear from intra prediction, as shown in the right part of Figure 1 (whereby Figures 1 (A) and (B) illustrate the ringing effect on a zoomed original video frame and a zoomed compressed video frame respectively). The ringing effect degrades the objective and subjective quality of video frames.

As a non-iterative and straightforward filtering technique, bilateral filtering is widely used in image processing because of its edge-preserving and noise-reducing features. Unlike the conventional linear filters of which the coefficients are predetermined, a bilateral filter decides its coefficients based on the contrast of the pixels in addition to the geometric distance. A Gaussian function has usually been used to relate coefficients to the geometric distance and contrast of the pixel values.

For a pixel located at (i, j) which will be filtered using its neighboring pixel (k, I), the weight ω(ί,;^', k, I) assigned for pixel (k, I) to filter the pixel (i, j) is defined as:

σ_ά is the spatial parameter, and o_r is the range parameter. The bilateral filter is controlled by these two parameters. I(i, j ) and l(k, I) are the original intensity levels of pixels(i, j) and (k,l) respectively.

After the weights are obtained, they are normalized, and the final pixel value I_D (i,j) is given by:

. , _ Σ_¾,;/(¾,ί)* ω(ί,;,¾,ί)

I_D is the filtered intensity of pixel (i, j).

Rate-Distortion Optimization (RDO) is part of the video encoding process. It improves coding efficiency by finding the "best" coding parameters. It measures both the number of bits used for each possible decision outcome of the block and the resulting distortion of the block. A deblocking filter (DBF) and a Sample Adaptive Offset (SAO) filter are included in the HEVC standard. In addition to these, Adaptive Loop Filter (ALF) filter is added into the later version of the Future Video Codec. Among those filters, SAO will remove some of the ringing artifacts but there is still room for improvements.

A problem for deploying bilateral filtering in video coding is that they are too complex, lack sufficient parameter settings and adaptivity.

The embodiments disclosed herein relate to further improvements to a filter.

Summary

It is an aim of the present invention to provide a method and apparatus which obviate or reduce at least one or more of the disadvantages mentioned above. According to a first aspect of the present invention there is provided a method, performed by a filter, for filtering of a picture of a video signal, wherein the picture comprises pixels, each pixel being associated with a pixel value, wherein a pixel value is modified by a weighted combination of the pixel value and at least one spatially neighboring pixel value. The method comprises storing weighting coefficients for use by the filter in modifying pixel values, the weighting coefficients being stored in one or more look up tables, LUTs. The method comprises sharing at least one LUT for storing weighting coefficients for use by either an intra decoding operation or an inter decoding operation, by storing weighting coefficients relating to one of the intra decoding operation or inter decoding operation in the at least one shared LUT, and deriving approximated weighting coefficients for the other of the intra decoding operation or inter decoding operation using an approximation function to modify the weighting coefficients stored in the at least one shared LUT.

According to another aspect of the present invention there is a filter, for filtering of a picture of a video signal, wherein the picture comprises pixels, each pixel being associated with a pixel value, the filter being configured to modify a pixel value by a weighted combination of the pixel value and at least one spatially neighboring pixel value. The filter is operative to store weighting coefficients for use by the filter in modifying pixel values, the weighting coefficients being stored in one or more look up tables, LUTs. The filter is operative to share at least one LUT for storing weighting coefficients for use by either an intra decoding operation or an inter decoding operation, by storing weighting coefficients relating to one of the intra decoding operation or inter decoding operation in the at least one shared LUT, and deriving approximated weighting coefficients for the other of the intra decoding operation or inter decoding operation using an approximation function to modify the weighting coefficients stored in the at least one shared LUT.

According to another aspect there is provided a decoder comprising a modifying means configured to modify a pixel value by a weighted combination of the pixel value and at least one spatially neighboring pixel value, wherein weighting coefficients for use by the decoder for modifying pixel values are stored in one or more look up tables, LUTs. The decoder comprises at least one shared LUT for storing weighting coefficients for use by either an intra decoding operation or an inter decoding operation. The decoder is operative to store weighting coefficients relating to one of the intra decoding operation or inter decoding operation in the at least one shared LUT, and derive approximated weighting coefficients for the other of the intra decoding operation or inter decoding operation using an approximation function to modify the weighting coefficients stored in the at least one shared LUT.

According to another aspect there is provided a computer program comprising instructions which, when executed on at least one processor, cause the at least one processor to carry out the method according to the methods described herein, and as defined in the appended claims.

According to another aspect, there is provided a computer program product comprising a computer-readable medium with the computer program as described above.

Brief description of the drawings

For a better understanding of examples of the present invention, and to show more clearly how the examples may be carried into effect, reference will now be made, by way of example only, to the following drawings in which:

Figures 1 (A) and (B) illustrate the ringing effect on a zoomed original video frame and a zoomed compressed video frame respectively; Figure 2 illustrates an 8x8 transform unit block and the filter aperture for the pixel located at (1 ,1 );

Figure 3 illustrates a plus sign shaped deringing filter aperture;

Figure 4 illustrates a rectangular shaped deringing filter aperture of size MxN=3x3 pixels;

Figure 5 illustrates the steps performed in a filtering method according to an example;

Figure 6 illustrates a filter according to an example;

Figure 7 illustrates a data processing system in accordance with an example; Figure 8 shows an example of a method according to an embodiment; Figure 9 shows an example of a filter according to an embodiment; Figure 10 shows an example of a decoder according to an embodiment;

Figure 1 1 illustrates schematically a video encodes according to an embodiment; and Figure 12 illustrates schematically a video decoder according to an embodiment. Detailed description

The following sets forth specific details, such as particular embodiments for purposes of explanation and not limitation. But it will be appreciated by one skilled in the art that other embodiments may be employed apart from these specific details. In some instances, detailed descriptions of well known methods, nodes, interfaces, circuits, and devices are omitted so as not obscure the description with unnecessary detail. Those skilled in the art will appreciate that the functions described may be implemented in one or more nodes using hardware circuitry (e.g., analog and/or discrete logic gates interconnected to perform a specialized function, ASICs, PLAs, etc.) and/or using software programs and data in conjunction with one or more digital microprocessors or general purpose computers. Nodes that communicate using the air interface also have suitable radio communications circuitry. Moreover, where appropriate the technology can additionally be considered to be embodied entirely within any form of computer- readable memory, such as solid-state memory, magnetic disk, or optical disk containing an appropriate set of computer instructions that would cause a processor to carry out the techniques described herein.

Hardware implementation may include or encompass, without limitation, digital signal processor (DSP) hardware, a reduced instruction set processor, hardware (e.g., digital or analog) circuitry including but not limited to application specific integrated circuit(s) (ASIC) and/or field programmable gate array(s) (FPGA(s)), and (where appropriate) state machines capable of performing such functions.

In terms of computer implementation, a computer is generally understood to comprise one or more processors, one or more processing units, one or more processing modules or one or more controllers, and the terms computer, processor, processing unit, processing module and controller may be employed interchangeably. When provided by a computer, processor, processing unit, processing module or controller, the functions may be provided by a single dedicated computer, processor, processing unit, processing module or controller, by a single shared computer, processor, processing unit, processing module or controller, or by a plurality of individual computers, processors, processing units, processing modules or controllers, some of which may be shared or distributed. Moreover, these terms also refer to other hardware capable of performing such functions and/or executing software, such as the example hardware recited above. The filters described herein may be used in any form of user equipment, such as a mobile telephone, tablet, desktop, netbook, multimedia player, video streaming server, set-top box or computer. Although in the description below the term user equipment (UE) is used, it should be understood by the skilled in the art that "UE" is a non-limiting term comprising any mobile device, communication device, wireless communication device, terminal device or node equipped with a radio interface allowing for at least one of: transmitting signals in uplink (UL) and receiving and/or measuring signals in downlink (DL). A UE herein may comprise a UE (in its general sense) capable of operating or at least performing measurements in one or more frequencies, carrier frequencies, component carriers or frequency bands. It may be a "UE" operating in single- or multi- radio access technology (RAT) or multi-standard mode. As well as "UE", the general terms "terminal device", "communication device" and "wireless communication device" are used in the following description, and it will be appreciated that such a device may or may not be 'mobile' in the sense that it is carried by a user. Instead, the term "terminal device" (and the alternative general terms set out above) encompasses any device that is capable of communicating with communication networks that operate according to one or more mobile communication standards, such as the Global System for Mobile communications, GSM, UMTS, Long-Term Evolution, LTE, etc. A UE may comprise a Universal Subscription Identity Module (USIM) on a smart-card or implemented directly in the UE, e.g., as software or as an integrated circuit. The operations described herein may be partly or fully implemented in the USIM or outside of the USIM.

An earlier co-pending patent application by the present Applicant, PCT/SE2017/050776 filed on 1 1 July 2017, describes a dedicated deringing filter in HEVC, which introduces a deringing filter into the Future Video Codec (the successor to HEVC). The deringing filter proposed in the earlier application is evolved from a bilateral filter, and proposes some simplifications, and how to adapt the filtering to local parameters in order to improve the filtering performance.

The embodiments described herein are concerned with reducing the amount of look- up-tables (LUTs) needed in filters, including for example a deringing filter as described in the earlier application.

According to embodiments described herein, a deringing filter apparatus and method reuse at least one of the same look-up table LUT for both inter and intra, and use an approximation function, for example a scaling factor, or a scaling factor and offset, to obtain an approximation of one from the other.

An advantage of embodiments described herein is that the size of the LUT is reduced, for example by 50%. This is advantageous in hardware implementations, where LUT size comes at a premium. As an example, in a filter as proposed in JCTVC-E0032 each pixel uses four or five weights. In some implementations it may be of interest to read these four or five weights at the same time. This would mean that the LUT would have to be implemented not once, but four or five times. The filter according to the embodiments described herein may be implemented in a video encoder and a video decoder. It may be implemented in hardware, in software or a combination of hardware and software. The filter may be implemented in, e.g. comprised in, user equipment, such as a mobile telephone, tablet, desktop, netbook, multimedia player, video streaming server, set-top box or a computer.

Prior to describing the embodiments of the present invention, reference will first be made to the examples of the earlier co-pending application PCT/SE2017/050776 filed on 1 1 July 2017, to provide background and context for the present invention. The embodiments of the earlier application will be referred to below as examples.

The examples of the earlier application provide advantages in that the proposed filtering removes ringing artifacts in compressed video frames so a better video quality (both objectively and subjectively) can be achieved with a small increase in codec complexity. Objectively, coding efficiency as calculated by Bj0ntegaard-Delta bit rate (BD-rate) is improved by between 0.5 and 0.7%.

Example 1

According to a first example, a bilateral deringing filter with a plus sign shaped filter aperture is used directly after inverse transform. An identical filter and identical filtering process is used in the corresponding video encoder and decoder to ensure that there is no drift between the encoder and the decoder.

The first example describes a way to remove ringing artifacts by using a deringing filter designed in the earlier application. The deringing filter is evolved from a bilateral filter.

By applying the deringing filter, each pixel in the reconstructed picture is replaced by a weighted average of itself and its neighbors. For instance, a pixel located at (i, j), will be filtered using its neighboring pixel (k, I). The weight ω(ί,], k, Ϊ) is the weight assigned for pixel (k, I) to filter the pixel (i, j),and it is defined, as mentioned earlier, as: l(i, j ) and l(k, I) are the original reconstructed intensity value of pixels(i, j) and (k,l) respectively. σ_ά is the spatial parameter, and o_r is the range parameter. The bilateral filter is controlled by these two parameters. In this way, the weight of a reference pixel (k,l) to the pixel(IJ) is dependent both on the distance between the pixels and the intensity difference between the pixels. In this way, the pixels located closer to the pixel to be filtered, and that have smaller intensity difference to the pixel to be filtered, will have larger weight than the other more distant (spatial or intensity) pixels. In this example, σ_ά and o_r are constant values.

The deringing filter, in this example, is applied to each TU block after reverse transform in an encoder, as shown in Figure 2, which shows an example of a 8x8 block. This means, for example, that following subsequent Intra-coded blocks will predict from the filtered pixel values. The filter may also be used during R-D optimization in the encoder. The identical deringing filter is also applied to each TU block after reverse transform in the corresponding video decoder.

In this example, each pixel in the transform unit is filtered using its direct neighboring pixels only, as shown in Figure 3. The filter has a plus sign shaped filter aperture centered at the pixel to be filtered.

The output filtered pixel intensity I_D (i,j), as mentioned earlier, is defined as:

∑_kii Kk, Q * o}(i,j, k, V)

(2)

∑_k,i O)(i,j, k, I)

In an efficient implementation of the first example, in a video encoder/decoder, the proposed deringing filter's all possible weights (coefficients) are calculated and stored in a two-dimensional look-up-table(LUT). The LUT can for instance, use spatial distance and intensity difference between the pixel to be filtered and reference pixels as index of the LUT. In the case where the filter aperture is a plus, there will only be two distances; the distance 0 for the middle pixel and the distance 1 for the other four pixels. Furthermore, the middle pixel will not have any intensity difference (since the middle pixel is the filtered pixel) and therefore its weight will always be e° = 1 when calculated using equation 1. Thus in the case of the plus shaped filter of Figure 3, it will be sufficient with a one-dimensional lookup table (LUT), indexed on the difference in intensity, or indexed on the absolute value of the difference in intensity. Instead of one LUT one could have one LUT dedicated to a weight dependent on distance from the current pixel (w d) and another LUT dedicated to a weight dependent on closeness in pixel value (w __r). It should be noted that the exponential function used to determine the weights could be some other function as well. The LUT could be optimized based on some error metric (SSD, SSIM) or according to human vision.

Instead of one LUT one could also have one LUT for weights vertically above or below of current pixel and another LUT for weights horizontally left or right of current pixel.

Example 2

According to the second example of the earlier application, a deringing filter with a rectangular shaped filter aperture is used in the video encoder's R-D optimization process. The same filter is also used in the corresponding video decoder.

In the second example each pixel is filtered using its neighboring pixels within a M by N size rectangular shaped filter aperture centered at the pixel to be filtered, as shown in Figure 4. The same deranging filter as in the first example is used.

Example 3

The deringing filter according to the third example of the earlier application is used after prediction and transform have been performed for an entire frame or part of a frame. The same filter is also used in the corresponding video decoder.

The third example is the same as the first or second example, except that the filtering is not done right after the inverse transform. Instead the proposed filter applies to reconstructed picture in both encoder and decoder. On the one hand this could lead to worse performance since filtered pixels will not be used for intra prediction, but on the other hand the difference is likely very small and the existing filters are currently placed at this stage of the encoder/decoder. Example 4 In this example, a_d and/or σ_Γ are related to TU size.

The σ_ά and o_r can be a function of the form (e.g. a polynomial function):

a_r = f₂ (TU size)

If both σ_ά and o_r are derived based on TU size, a preferred example is to have different functions f ≠ f₂. If the transform unit is non-quadratic, it may be possible to instead use o_d = 0.92 - min{TU block width, TU block height] * 0.025. Alternatively, it is possible to use o_d = 0.92 - max{TU block width, TU block height] * 0.025 , or o_d = 0.92 - mean{TU block width, TU block height] * 0.025, where mean{a, b] = (a + b)/2. When transform size is different in vertical and horizontal directions, the o_d can be separate for filter coefficients vertically and horizontally so o__{d ver} , o__{d hor} and o_{r ver} , °_r_hor a function of the form (e.g. a polynomial function):

<*d hor = f(TU width)

a_d __ver = f(TU height)

f(TU width)

^d ver = f(TU height)

Od hor = 0.92 - (TU block width) ^*0.025, o_{d ver} = 0.92 - (TU block height) ^*0.025

A further generalization is to have to have a weight and/or size dependent on distance based on a function based on TU size or TU width or TU height and a weight and /or size dependent on pixel closeness based on a function based on TU size or TU width or TU height.

Example 5

In this example a_d and σ_Γ are related to the QP value. Thus the o_d and o_r can be a function of the form: Or = U (QP) A preferred function f is σ_Γ = clip(^^{QP 17)*2} (^blt-^{depth 8)} ₎ o.o )_! wherein bit_depth correspond to the video bit depth, i.e. the number of bits used to represent pixels in the video. In a particular case when bit_depth=10, we have σ_Γ = clip((QP— 17)/ 2, 0.01). If both a_d and σ_Γ are derived based on QP, a preferred example is to have different functions f₃≠ f₄.

The QP mentioned here relates to the coarseness of the quantization of transform coefficients. The QP can correspond to a picture or slice QP or even a locally used QP, i.e. QP for TU block. QP can be defined differently in different standards so that the QP in one standard do not correspond to the QP in another standard. In HEVC, and so far in JEM, six steps of QP change doubles the quantization step. This could be different in a final version of H.266 where steps could be finer or coarser and the range could be extended beyond 51 . Thus, in a general example the range parameter is a polynomial model, for example first order model, of the QP.

Another approach is to define a table with an entry for each table where each entry relates to the reconstruction level of at least one transform coefficient quantized with QP to 1 . For instance, a table of a_d and/or or a table of σ_Γ created where each entry, i.e., QP value, relates to the reconstruction level, i.e., pixel value after inverse transform and inverse quantization, for one transform coefficient quantized with QP to 1 , e.g., the smallest possible value a quantized transform coefficient can have. This reconstruction level indicates the smallest pixel value change that can originate from a true signal. Changes smaller than half of this value can be regarded as coding noise that the deringing filter should remove.

Yet another approach is to have the weights dependent on quantization scaling matrices especially relevant are the scaling factors for the higher frequency transform coefficients since ringing artefacts are due to quantization of higher frequency transform coefficients. Currently, HEVC uses by default a uniform reconstruction quantization (URQ) scheme that quantizes frequencies equally. HEVC has the option of using quantization scaling matrices, also referred to as scaling lists, either default ones, or quantization scaling matrices that are signaled as scaling list data in the sequence parameter set (SPS) or picture parameter set (PPS). To reduce the memory needed for storage, scaling matrices are typically only be specified for 4x4 and 8x8 matrices. For the larger transformations of sizes 16x16 and 32x32, the signaled 8x8 matrix is applied by having 2x2 and 4x4 blocks share the same scaling value, except at the DC positions.

A scaling matrix, with individual scaling factors for respective transform coefficient, can be used to make a different quantization effect for respective transform coefficient by scaling the transform coefficients individually with respective scaling factor as part of the quantization. This enables, for example, that the quantization effect is strongerfor higher frequency transform coefficients than for lower frequency transform coefficients. In HEVC, default scaling matrices are defined for each transform size and can be invoked by flags in the SPS and/or the PPS. Scaling matrices also exist in H.264. In HEVC it is also possible to define own scaling matrices in SPS or PPS specifically for each combination of color component, transform size and prediction type (intra or inter mode).

In an example, deringing filtering is performed for at least reconstruction sample values from one transform coefficient using the corresponding scaling factor, as the QP, to determine a_d and/or σ_Γ. This could be performed before adding the intra/inter prediction or after adding the intra/inter prediction. Another less complex approach would be to use the maximum or minimum scaling factor, as the QP, to determine a_d and/or σ_Γ.

The size of the filter can also be dependent of the QP so that the filter is larger for larger QP than for small QPs. For instance, the width and/or the height of the filter kernel of the deringing filter is defined for each QP. Another example is to use a first width and/or a first height of the filter kernel for QP values equal or smaller than a threshold and a second, different width and/or a second, different height for QP values larger than a threshold. Example 6

In this example a_d and σ_Γ are related to video resolution.

The σ_ά and o_r can be a function of the form:

°_d ⁼ fs(frame diagonal )

a_r = f₆ (frame diagonal )

The size of the filter can also be dependent of the size of the frame. If both a_d and σ_Γ are derived based on frame diagonal, a preferred example is to have different functions f₅≠ f₆.

Small resolutions can contain sharper texture than large resolutions, which can cause more ringing when coding small resolutions. Accordingly, at least one of the spatial parameter and the range parameter can be set such that stronger deringing filtering is applied for small resolutions as compared to large resolutions.

Example 7

According to this example the a_d and σ_Γ are related to QP, TU block size, video resolution and other video properties.

The σ_ά and o_r can be a function of the form:

°_d ⁼ fi{QP_< TU size, frame diagonal, )

a_r = fs(QP, TU size, frame diagonal, )

An example my comprise the example 1 combined with the functions σ_ά = 0.92 - (TU block width) * 0.025

and

o_r = (QP - 17) / 2

Example 8

In this example the de-ringing filter is applied if an inter prediction is interpolated, e.g. not integer pixel motion, or the intra prediction is predicted from reference samples in a specific direction (e.g. non DC) or that the transform block has non-zero transform coefficients.

De-ringing can be applied directly after intra/inter prediction to improve the accuracy of the prediction signal or directly after the transform on residual samples to remove transform effects or on reconstructed samples (after addition of intra/inter prediction and residual) to remove both ringing effects from prediction and transform or both on intra/inter prediction and residual or reconstruction. Example 9

In this example, the filter weights (Wd , w_r or similarily a_d , a_r ) and/or filter size can be individually for intra prediction mode and/or inter prediction mode.

The filter weights and/or filter size can be different in vertical and horizontal direction depending on intra prediction mode or interpolation filter used for inter prediction. For example, if close to horizontal intra prediction is performed the weights could be smaller for the horizontal direction than the vertical direction and for close to vertical intra prediction weights could be smaller for the vertical direction than the horizontal direction. If sub-pel interpolation with an interpolation filter with negative filter coefficients only is applied in the vertical direction the filter weights could be smaller in the horizontal direction than in the vertical direction and if sub-pel interpolation filter with negative filter coefficients only is applied in the horizontal direction the filter weights could be smaller in the vertical direction than in the horizontal direction. Example 10

In this example, the filter weights (Wd , w_r or similarly a_d , σ_Γ ) and/or filter size can depend on the position of non-zero transform coefficients.

The filter weights and/or filter size can be different in vertical and horizontal direction depending non-zero transform coefficient positions. For example, if non-zero transform coefficients only exist in the vertical direction at the lowest frequency in the horizontal direction the filter weights can be smaller in the horizontal direction than in the vertical direction. Alternatively, the filter is only applied in the vertical direction. Similarly, if nonzero transform coefficients only exist in the horizontal direction at the lowest frequency in the vertical direction the filter weights can be smaller in the vertical direction than in the horizontal direction. Alternatively, the filter is only applied in the horizontal direction.

The filter weights and/or filter size can also be dependent on existence of non-zero transform coefficients above a certain frequency. The filter weights can be smaller if only low frequency non-zero transform coefficients exist than when high frequency non-zero transform coefficients exist.

Example 1 1

In this example, the filter weights (Wd , w_r or similarily a_d , a_r ) and/or filter size can be different for depending on a transform type.

Type of transform can refer to transform skip, KLT like transforms, DCT like transforms, DST transforms, non-separable 2D transforms, rotational transforms and combination of those.

As an example the bilateral filter could only be applied to fast transforms, weight equal to 0 for all other transform types. Different types of transforms can require smaller weights than others since they cause less ringing than other transforms.

When transform skip is used no transform is applied and, then, ringing will not come from the basis function of the transform. Still there would be some quantization error due to quantization of the residual that benefit from deringing filtering. However, in such a case the weight could be potentially be smaller in order to to avoid overfiltering. More specialized transforms like KLT could possibly also benefit from filtering but likely less strong filtering, i.e., smaller filter weights and a_d , σ_Γ , than for DCT and DST. Example 12

In this example, the filtering may be implemented as a differential filter which output is clipped (Clip) to be larger than or equal to a MIN value and less than or equal to a MAX value, and added to the pixel value instead of using a smoothing filter kernel like the Gaussian. l_D (i ) = i(k, Q + s * Clip(MIN, MAX, ^ ί) * ω(ί,;^', k, /))) (3)

The differential filter can for example be designed as the difference between a dirac function and a Gaussian filter kernel. A sign (s) can optionally also be used to make the filtering to enhance edges rather than smooth edges if that is desired for some cases.

The MAX and MIN value can be a function of other parameters as discussed in other examples. The usage of a clipping function can be omitted but allows for an extra freedom to limit the amount of filtering enabling the use of a stronger bilateral filter although limiting how much it is allowed to change the pixel value.

To allow for different MAX and MIN values in the horizontal and the vertical direction the filtering can be described as a vertical filtering part and a horizontal filtering part as shown below:

I_D (i,j) = I(k, l) + s * (Clip ^MIN_ver, MAX_ver, ^ I(k, l) * ω(ί,;^', k, I) + Clip I(k, I) * ω(ί,;^', k, 1) ) (4)

The MAXJior, MAX_ver, and MINJior and MIN_ver can be a function of other parameters as discussed in other examples.

Example 13

One aspect is to keep the size of a LUT small. In case we set the σ_ά and o_r parameters using

σ_ά = 0.92 - (TU block width) * 0.025

and

a_r = (QP - 17)/ 2 Then, the size of the LUT can become quite big. As an example, if we assume 10 bit accuracy, the absolute difference between two luma values can be between 0 and 1023. Thus if we know the TU block width and the QP, we need to store 1024 values, which in floating point occupies 4096 bytes.

There are four different TU sizes available. This means that we need 4 look-up tables of size 4096, which equals 16384 bytes or 16 kilobytes. This can be expensive in a hardware implementation. Therefore, in one example, we take advantage of the fact that Equation 1 can be rewritten as

(i-fc)² + (j- ² \ ( \\i(i,j) - i(k,i)\\

ω(ί, j, k, I) = e v 2σ, 2 a

(5)

If we keep o_r fixed, we can now create one LUT for the expression e

which will occupy 4096 bytes. The first factor of the expression in Equation 5 depends on σ_ά . Since there are four TU sizes, there are four different possible values of on σ_ά .

Thus a LUT of only four values is sufficient to obtain

Four values can be stored in 4^*4=16 bytes. Thus in this solution we have lowered the storage needs for the LUT from 16384 bytes to 4096+16=41 12 bytes, or approximately 4kB. Now, for the special case with the plus-shaped filter, we can further notice that the distance (i - k)² + (j— I)² will always be equal to 1 (in the case of the four neighbors) or 0 (in the case of the middle pixel). We can therefore write

if (i, i is a neiqhbor ixel

if i,j) is the middle pixel where we have used the fact that ω (ί,;, k, ΐ) is equal to 1 for the middle pixel and we can write 1 as

This means that we can write ω (ί,;, k, I) as ω (ί,;, k, I) =eV ^2a<i)n(i,j, k, I) where if [,]) is a neighbor pixel

if (i ) is the middle pixel.

Equation (2) thus becomes:

∑_k,i Kk, Q * e z° n(i,j, k, l)

(6)

∑_fc,; eV ²°_dJn(i,j, k, l)

1 and we can see that we can divide both the nominator and denominator with e ^2σΙ which yields

∑_kii Kk, Q * n(i,j, k, V)

∑_k,i n(i,j, k, I)

If we let I₀ be the intensity of the middle pixel I₀ = I(i,j) and we let the intensity of the neighboring upper pixel be I = I(i,j— 1), the intensity of the neighboring right pixel be = I(i + lj) ^and the intensity of the neighboring left pixel be I₃ = I(i— and the intensity of the neighboring lower pixel be I₀ = I(i,j + 1) we can write Equation 7 as

The largest possible value

comes when the difference in intensity is zero, which will give a value of 1.0. Assume that we want to use 8 bits for the filtering. We then simply store the value round(255 * ^{2 σ}ϊ ') in the LUT. By doing this, we can use a single byte per LUT entry, which means that we can go down from 1024^*4+16=41 12 bytes to 1024+16 = 1040 bytes, or about 1 kByte. Furthermore, we know that the largest possible value for o_r will be 16.5, assume the largest QP we will use is 59, which means that every LUT entry where the difference in intensity is larger than 59 will get a value before rounding smaller than

( II59II ² N

255 * ev 2*i6.5²J = 0.4267 which will be rounded to zero. Hence it is not necessary to extend the LUT to more than 59. This reduces the LUT size to 60+16=76 bytes or about 0.07 kilobyte. The difference in intensity can be checked against 59, and if it is larger than 59 it is set to 59. The value that will be fetched from the LUT will be 0 (since the LUT for 59 is zero) which will be correct.

An alternative is to make the LUT larger up to the nearest power of two minus one, in this case 31. Thus it is sufficient to check if any bit larger than bit 5 is set. If so, 31 is used, otherwise the value is used as is.

Example 14

In this example, the approach as described above can be implemented with filtering in float or integers (8 or 16bit or 32 bit). Typically, a table lookup is used to determine respective weight. Here is an example of filtering in integers that avoids division by doing table lookup of a multiplication factor and shift factor. {i,j, k, 0) ] + roundF)

Where lookup_M determines a multiplication factor to increase the gain of the filtering to close to unity (weights sum up to 1 « lookup_Sh) given that the "division" using right shift (») has the shift value (lookup_Sh) limited to be a multiple of 2. lookup_Sh(A) gives a shift factor that together with the multiplication factor lookup_M gives a sufficient approximation of 1/A. roundF is a rounding factor which is equal to lookup_Sh » 1. If this approximation is done so that the gain is less or equal to unity the filtering will not increase the value of the filtered pixel outside the value of the pixel values in the neighborhood before the filtering.

Example 15

In this example, one approach to reduce the amount of filtering is to omit filtering if the sum of the weights is equal to the weight for the center pixel. Another approach is to consider which weight is needed on neighboring pixels to be able to change the value of the current pixel. Let wn be the sum of neighboring weights and wtot be the total sum of weights including the center pixel. Then consider 10 bit data 0 to 1023. Thus to get an impact of 1 wn must be (1023^*wn)/wtot=1 -> wn>=wtot/1023 or in fixed point implementation wn>= (wtot + (1 «9)) »10. Thus if the sum of the neighboring weights is below this no filtering needs to be deployed since the filtering will anyway not change the pixel value. Example 16

The filtering as described in other examples can alternatively be performed by separable filtering in horizontal and vertical direction instead for 2D filtering as mostly described in other examples. In addition to the above examples described in the earlier application, the following examples will now be described in order to provide further context to the present embodiments.

Example 17

In this example one set of weights (wd, wr or similarly o_(d ),o_(r )) and/or filter size is used for blocks that have been intra predicted and another set of weights and/or filter size is used for blocks that have been inter predicted. Typically the weights are set to reduce the amount of filtering for blocks which have been predicted with higher quality compared to blocks that have been predicted with lower quality. Since blocks that have been inter predicted typically has higher quality than blocks have been intra predicted they are filtered less to preserve the prediction quality.

One example to have one Wd or similarily a_d for blocks that have been intra predicted and a smaller Wd or similarily_a_d _for blocks that have been inter predicted.

Example weights for intra predicted blocks are: a_d = 0.92— min TU block width, TU block height) * 0.025 Example weights for inter predicted blocks are: σ_ά = 0.72— min(TU block width, TU block height) * 0.025

Example 18

In this example one set of weights (wd, wr or similarly o_(d ),o_(r )) and/or filter size depends on picture type /slice type.

One example is to use one set of weights for intra pictures/slices and another set weights are used for inter pictures/slices. One example to have one_Wd or similarily a_d for pictures/slices that have only been intra predicted and a smaller Wd or similarily_a_d for other pictures/slices.

Example weights for intra pictures/slices (e.g. I_SLICE) are: a_d = 0.92— min TU block width, TU block height) * 0.025

Example weights for inter pictures/slices (e.g P_SLICE, B_SLICE) are: a_d = 0.72— min TU block width, TU block height) * 0.025

B slices (bi-prediction allowed) that typically have better prediction quality than P slices (only single prediction) can in another variant of this example have a smaller weight than P slices. In another variant generalized B-slices that are used instead of P-slices for uni-directional prediction can have same weight as P-slices. "normal" B-slices that can predict from both future and past can have a larger weight than generalized B-slices.

Example weights for "normal" B-slices are: a_d = 0.82— min TU block width, TU block height) * 0.025

Example 19 In this example one set of weights (wd, wr or similarly o_(d ),o_(r )) and/or filter size is used intra pictures/slices and another set weights are used for inter pictures/slices that are used for reference for prediction of other pictures and a third set of weights are used for inter pictures/slices that not are used for reference for prediction of other pictures.

One example is to have one Wd or similarily a_d for pictures/slices that have only been intra predicted and a somewhat smaller Wd or similarily a_d for pictures/slices that have been inter predicted and are used for predicting other pictures and smallest Wd or similarily a_d for pictures/slices that have been inter predicted but not are used for prediction of other pictures (non_reference picture).

Example weights for inter pictures/slices (e.g. P_SLICE, B_SLICE) that not are used for reference (non_reference picture) are: a_d = 0.82— min TU block width, TU block height) * 0.025

Example weights for inter pictures/slices (e.g. P_SLICE, B_SLICE) that are used for reference are:

_σ_ά = 0.72— min(TU block width, TU block height) * 0.025

Example 20

In this example, to enable some adaptivity with respect to the used weights (at least one of or all of wd,, wr or similarly o_(d ),o_(r )) an encoder can select which values of the weights to use and encode them in SPS (sequence parameter sets) , PPS (picture parameter sets) or slice header.

A decoder can then decode the values of the weights to be used for filtering respective picture/slice. In a variant of this example specific values of the weights are given for blocks that are intra predicted compared to blocks that are inter predicted are encoded in SPS/PPS or slice header. A decoder can then decode the values of the weights to be used for blocks that are intra predicted and the values of the weights to be used for blocks that are inter predicted.

A data processing system, as illustrated in Figure 7, can be used to implement the filter of the examples described above. The data processing system includes at least one processor that is further coupled to a network interface via an interconnect. The at least one processor is also coupled to a memory via the interconnect. The memory can be implemented by a hard disk drive, flash memory, or read-only memory and stores computer-readable instructions. The at least one processor executes the computer- readable instructions and implements the functionality described above. The network interface enables the data processing system to communicate with other nodes in a network. Alternative examples may include additional components responsible for providing additional functionality, including any functionality described above and/or any functionality necessary to support the solution described herein.

Next, embodiments of the present invention will be described.

A filter as described in the embodiments below, or the examples above, may be implemented in a video encoder and a video decoder. It may be implemented in hardware, in software or a combination of hardware and software. The filter may be implemented in, e.g. comprised in, user equipment, such as a mobile telephone, tablet, desktop, netbook, multimedia player, video streaming server, set-top box or a computer.

As mentioned above, the embodiments described herein are concerned with reducing the amount of look-up-tables (LUTs) needed, or the size of LUTs needed. According to embodiments described herein, a filter, such as a deringing filter apparatus and method reuse the same look-up table LUT for both inter and intra, and use an approximation function, for example a scaling factor, or a scaling factor and offset, to obtain an approximation of one from the other.

Figure 8 discloses a method according to an embodiment, performed by a filter, for filtering of a picture of a video signal, wherein the picture comprises pixels, each pixel being associated with a pixel value. A pixel value is modified by a weighted combination of the pixel value and at least one spatially neighboring pixel value.

The method comprises storing weighting coefficients for use by the filter in modifying pixel values, the weighting coefficients being stored in one or more look up tables, LUTs, step 801 .

The method comprises sharing at least one LUT (LUTSHARED) for storing weighting coefficients for use by either an intra decoding operation or a inter decoding operation, step 803.

The sharing is performed by storing weighting coefficients relating to one of the intra decoding operation or inter decoding operation in the at least one shared LUT (LUTSHARED), step 805.

Approximated weighting coefficients for the other of the intra decoding operation or inter decoding operation are derived using an approximation function to modify the weighting coefficients stored in the at least one shared LUT (LUTSHARED), step 807. According to one embodiment, the approximation function comprises a scaling factor (s), for scaling an obtained weighting coefficient ω from the at least one shared LUT (LUTSHARED) by the scaling factor (s).

According to another embodiment, the approximation function comprises a scaling factor (s) for scaling an obtained weighting coefficient ω from the at least one shared LUT (LUTSHARED) by the scaling factor (s), and an offset value (p).

Further details will now be provided in relation to how a LUT may be shared, and how the size of a LUT may be reduced, according to various embodiments, and the different types of approximation function that may be used with different embodiments.

It is noted that the embodiments will be described in terms of weighting coefficients relating to intra decoding operations being stored in a shared LUT (i.e. weight coefficients ^ωίηίΓα ). and approximated weighting coefficients relating to inter decoding operations being derived therefrom (i.e. weighting coefficients o)_inter). It is noted, however, that the reverse is also possible, whereby weighting coefficients relating to inter decoding operations are stored in a shared LUT, and approximated weighting coefficients relating to intra decoding operations being derived therefrom, with the approximation functions being adapted accordingly. A preferred approach, however, is to store the intra LUT and re-use it for inter because the strength of filtering is typically larger for intra and thus the LUT for intra contains more non-zero weights that can be scaled away when re-using it for inter. In this way, a higher accuracy of the filter weights as well as the resulting filtered value can be preserved.

In the earlier co-pending patent application PCT/SE2017/050776 mentioned above, there is described a method where the σ_ά is calculated using the Transform Unit size, TU-size, and the o_r is calculated using the Quantization Parameter, QP. Efficient ways to calculate these are, for example, described in JCTVC-E0032 (forming Annex A at the end of this description). For intra blocks, the following equation, for example, may be used: min(TU block width, TU block height)

a_d = 0.92 - (fq. lO)

40 and for inter blocks, the following equation may be used min(TU block width, TU block height)

a_d = 0.72 - {Eq 11)

40

For both inter and intra-blocks, the following equation, for example, may be used:

In one example, a 3D LUT is used. Since there are, for example, 52 possible QP values, and for example three different relevant TU sizes, and for example 1023 possible intensity differences, the number of elements in a 3D LUT, in such an example, becomes 1023^*52^*3 = 159588 values. If 8 bits per value is used, for example, then this becomes 159588 bytes.

According to one embodiment, one possible optimization is to realize that not all 1023 differences are needed. The value of the weight in equation (1 ) only depends on the absolute value of the difference between two luma values (or intensity values). If varies between 0 and 1023 and I(k, I) also varies between 0 and 1023 there are only 1023 possible values of the absolute value - I(k, Hence the intensity dimension of the LUT does not need to be larger than 1023.

However, according to an embodiment it can be made shorter than this. For values of — II over a certain threshold value absmax, the resulting weight ω is so small that it is possible to set the weight to zero without much of an error. Hence, in some embodiments it is possible to only need to tabulate the intensity dimension of the LUT from zero to absmax, i.e. only for absolute difference values which are below or equal to the threshold value, absmax.

According to another embodiment, another possible optimization is to make use of the fact that it is possible to write the weight as:

(i-k)² + (j-1)² ||I(i,j) - I(k,l) || '

u>(i, j, k, l) = eV 2σί 2σί

(i \ . ||_{I ft}j

= _e\ ^2σ J_e ²°r I (Eq. 13)

It is therefore possible to have a first LUT (LUTFIRST) for the first expression

(i-k)² + (j-l)² \

e ^2CTd / , and another LUT, a second LUT (LUTSECOND) for the second

l l|I(i,j) - I(k,l)||²N

expression ^2σ? ' .

( (i-k)² + (j-l)² \

The first LUT LUTFIRST) stores the value e ²°A ' , which depends on (i - k)² + (j— 0² and σ_ά. However, in an embodiment using a plus shape of the filter, (i - k)² + (j— 0² is always equal to 1 for every pixel except the middle pixel. The expression therefore only depends on σ_ά for these pixels, and σ_ά depends in turn only on the TU- size, of which there are predetermined number of sizes, for example three TU sizes. Thus, if we are not in the middle pixel, only three values need to be stored.

If we are in the middle pixel, (i - k)² + j— ΐ)² equals 0 and the entire expression becomes equal to 1.

Hence for the first expression only four values need to be stored, i.e. three for the different TU sizes and one representing the interger 1 . Thus, according to one embodiment the filter hape filter aperture, and whereby the first expression of equation

, is represented in a first LUT (LUTFIRST) by a reduced set of values.

The reduced set of values may be stored in a one-dimensional LUT such that the first LUT comprises a reduced LUT (LUTREDUCED) comprising T+1 values, wherein:

a first value relates to a middle pixel value comprising an integer 1 ; and the T values relate to T different possible TU sizes.

Th (LUTSECOND), relating to the second expression of Equation 13, i.e.

depends both on the absolute difference in intensity - I(k, l) \\ and on the o_r, which in turn depends on QP. Hence for this a LUT is provided that spans all the absolute differences from 0 to absmax times the number of QPs.

For the intra case, a typical value of absmax can be 244, for example, which makes the maximum filter error be smaller than 0.5. Also smaller values of absmax are possible. The number of possible QPs are, for example, 52 so the LUT could be of the size 52x245 = 12740 bytes if 8 bits are used per value. Thus the total size would be 12740 + 4 instead of 159588, which is a significant savings. The above calculation is for the case of intra filtering. However, when inter filtering is required, different filter coefficients are required. If a 3D LUT is used to store the weights according to Equation 1 , the LUT would need 159588 for the intra case and 159588 for the inter case, or in total 159588^*2 = 319176 values. However, according to the present embodiments, if the separated way of two LUTs according to Equation 13 is used, where one was a one-dimensional LUT, for example of size 4 (3 values plus the 1 .0), and the other LUT was a two-dimensional LUT, for example of size (absmax+1)x(QP), one would at first think that one would need 12744^*2=25488 values.

However, since only the formula for a_d has been changed, the two-dimensional LUT can be reused, i.e. shared, for both intra and inter coding/decoding operations. Hence, with a shared LUT for this purpose, only a new one-dimensional LUT of size 4 is needed, and it is possible to make do with 12740+4+4 = 12748 values. That is significantly less than the 25488 values that would have been needed if a brute force implementation was used which just duplicated the behavior. A possible drawback with having two separable look-up tables according to Equation 13 is that an extra multiplication is needed. Hence, according to some embodiments, the weight calculation is implemented as a single 3D LUT. As described above, it is possible to only tabulate from 0 to absmax, so the 3D look-up table can be of the size {absmax+1)x(QP)x(TUsizes) or 245x52x3 which equals 38220.

According to some examples one such look-up table is used for the intra blocks and another one for the inter blocks, so in total 38220x2 = 76440 values are needed.

However, according to embodiments described herein, instead of duplicating the lookup table, it is possible to use one look-up table for the intra block and then the same lookup table for the inter blocks, modified using an approximation function, for example a scaling factor, e.g. multiplied by a constant, or a scaling factor and an offset.

From Equation 1 it is known that the value stored in the first LUT is:

where min(TU block width, TU block height)

°dintra = 0-92 ₁₀)

It is also known that the value that is required in the inter case is: (i-k)² + (j-l)² ||I(i,j)-I(k,l)||²\ I (i-k)² + (j-1)² \ I ||I(i,j)-I(k,l)||² Winter .j. k. ') = ^ ^{2<T? 2<Τ?} where

min(TU block width, TU block height) Winter = 0.72 — (Eq.11) Hence

(i-k)² + (j-l)² \ ||I(i,j)-I(k,l)||²N f (i-k)² + (j-l)²

ωί_ηίεν( _> j_> I) ^2(Tdinter ^2σ? e^ ^2(Tdinter

<Wintra(i.j. k, I) (i-k)² + (j-l)² \ _ ||I(i,j) - I(k,l)||²N ( (i-k)² + (j-l)

^2tTdintra ^2tTr ' e ^2tTdintra

This can be simplified to

(i-k)² + (j-l)² A / (i-k)² + (j-l)²

ωίηίετ( > \> , I) ^, _{2 Δ} ^{, ,} _{2 Δ}

= ^zudinter / V ^zudintra

In the case when we are not in the middle pixel, (i - k)² + (j— ΐ)² = 1, and this becomes

^^nter('' ^j' ^k' = _eV ^2ffdinter ^{+ 2ff}dintraJ = _S (Eq 13)

"intrad j. I) This constant s (scaling factor) only depends on the minimum TU size min(TU block width, TU block height) in this example, of which there are, for example, three possible values (ST = 3). Hence there are only three possible values of scaling factor (s). It is therefore possible to calculate <y_inter(i, j, k, I) as ^ωίηΐετ(ι_> j. k, l)= <^y _intra(i, j, k, l)^*s where s comes from a LUT of size 3.

The scaling factor (s) here is not going to be an integer in general, instead it will be a floating point number. As is known by a person skilled in the art, it may be advantageous to implement the multiplication of a number ω by a floating point constant (s) using fixed point arithmetics. In this case, instead of multiplying ω by (s), it is customary to multiply w by 2^ks , add t = 2^{k ~ 1} for rounding, followed by downshift. Hence, instead of using:

( >inter .i>j > k> ^{= ω}ίη£χα <7 > k, 0 * ^s> we use

Minter ii , k, 0 =

j, k, Q * sflX + t) » k, where » denotes bitwise right shift and where sfix = s * 2^k,

t = Z*-¹ and k is the precision of the fixed point operation, for example k = 8. It is also known for a person skilled in the art that, instead of storing s in the LUT and then executing the multiplication s * 2^k , preferably sfix is stored directly in the LUT, avoiding floating point multiplications altogether. Thus, according to some embodiments, the scaling factor (s) is a function of a minimum transform unit, TU, size, such that: s = min(TU block width, TU block height). According to some embodiments, the scaling factor (s) comprises a predetermined number of possible values.

According to some embodiments, the scaling factor (s) comprises ST possible scaling values, and wherein an approximated inter weighting coefficient w_inter(i, j, k, I) assigned for a pixel k, Ϊ) to filter a pixel, weight is determined as:

(^inter i) ' ji I) = ^intra ji l)*S , where the scaling factor (s) is derived from a scaling factor LUT (LUTSF),

In another embodiment it may be sufficient to use the same value of scaling factor (s) independently of TU size.

Thus, according to some embodiments the scaling factor (s) is independent of transform unit, TU, size. For example, in one embodiment the scaling factor (s) is a constant value. In yet another embodiment, improved results may be obtained by adding an offset, i.e., the value is calculated as:

Minted , j> k, l)= <- _a(i, j, k, l)*S+p where p is an offset that depends on TU size.

Thus, in such an embodiment the approximation function comprises a scaling factor (s) for scaling an obtained weighting coefficient ω from the at least one shared LUT (LUTSHARED) by the scaling factor (s), and an offset value (p).

According to an embodiment the weighting coefficients stored in a shared LUT (LUTSHARED) relate to intra weighting coefficients (<y_intra) for use with intra coding/decoding operations, and wherein an approximated inter weighting coefficient °>inter() _> j. k, I) assigned for a pixel (k, I) to filter a pixel, weight is determined as:

Minted , I k, l)= iO_intra(i, j, k, l)*S+p

As mentioned above, the offset value (p) may be an offset that depends on transform unit, TU, size.

In another embodiment neither the offset (p) nor the scaling factor (s) depends on TU size. In yet another embodiment, the filtering strength of the bilateral filter is lowered in another manner. Thus, according to some embodiments, one or more further scaling factors are used to lower a filtering strength of the filtering operation. For example, instead of multiplying the obtained ω by a scaling factor s, an offset is added to the absolute value calculation. Hence, instead of using - I(k, as the argument to the lookup table, — I(k, l) \\ + o is used, where o is an offset. If o is large, a value is obtained as if the absolute difference was larger, which results in a smaller ω, which in turn gives a lower filtering strength.

In yet another embodiment, the absolute difference — I(k, ΐ) \\ is multiplied with a scaling factor q according to q * — I(k, If the scaling factor q is larger than 1 , this will have the effect of obtaining a value as if the absolute difference was larger, which results in a smaller ω, which in turn gives a lower filtering strength.

In yet another embodiment, the absolute difference — I(k, l) \\ is first multiplied with a scaling factor q and then an offset r is added, according to q * — I(k, H + r. This has the effect of obtaining a value as if the absolute difference was larger, which results in a smaller ω, which in turn gives a lower filtering strength.

These factors s, o, q and r can be signalled to the decoder. For instance, it is possible to signal them in the PPS, or in the SPS, or in the slice header.

According to some embodiments, it is also possible to make the scaling factor q smaller than 1 in order to get stronger filtering results.

It is noted that, in the above embodiments, it has been assumed that QP can be any value from 0 to 51 , and hence a 3D look-up table as the one described above will have 1023^*52^*3 = 159588 values. However, for QPs smaller than, for example, 18 the o_r becomes 0.01 , which means the expression

becomes equal to _e( - ioo*ii/(ⁱj)-/(fc^, ii²) _whj_ch is a very small number unless the pixels are equal. Hence the filter is effectively turned off when the QP is smaller than 18. Hence, in another embodiment, the filter is turned completely off when the QP is smaller than 18. This means that no filtering needs to happen, such that the filtering can be avoided when QP is smaller than 18, for example to save complexity. Also, the LUT sizes can be reduced. As an example, instead of having 1023^*52^*3 = 159588 values, it is sufficient with 1023^*(52-17)^*3 =1023^*35^*3, i.e. 107415 values instead of 159588 values.

Furthermore, in some examples, different values of absmax can be used for different QPs. As an example, if QP is limited to 35 (halfway between 18 and 52), the absmax value may instead be 122 (half of previously mentioned absmax value of 244 in the example above). Hence two 3D LUTs may be used, one for QP < 35 and one for QP >= 35. The first LUT is thus of size 122^*17^*3 (6222 values) and the second one is of size 244^*18^*3 = (13176 values). These two combined LUTs would result in 6222+13176 =19398 values, which is smaller than if only one absmax of 244 is used, which would produce one table of (244^*35^*3) = 25620 values.

It is further possible to have more values of absmax instead of just two, further reducing the size of the combined LUTs.

From the above it can be seen that different embodiments can be used to reduce the number and/or size of LUT used in filtering operations.

Figure 9 shows an example of a filter 900 according to an embodiment, whereby the filter is implemented as a data processing system. The data processing system includes at least one processor 901 that is further coupled to a network interface via a network interface 905. The at least one processor 901 is also coupled to a memory 903 via the network interface 905. The memory 903 can be implemented by a hard disk drive, flash memory, or read-only memory and stores computer-readable instructions.

The at least one processor 901 executes the computer-readable instructions and implements the functionality described in the embodiments above. The network interface 905 enables the data processing system 900 to communicate with other nodes in a network. Alternative examples may include additional components responsible for providing additional functionality, including any functionality described above and/or any functionality necessary to support the solution described herein.

The filter 900 may be operative to filter a picture of a video signal, wherein the picture comprises pixels, each pixel being associated with a pixel value, the filter being configured to modify a pixel value by a weighted combination of the pixel value and at least one spatially neighboring pixel value.

The filter 900 comprises at least one shared LUT (LUTSHARED) 907.

The filter 900 may be operative to store weighting coefficients for use by the filter in modifying pixel values, the weighting coefficients being stored in one or more look up tables, LUTs; share at least one LUT (LUTSHARED, 907) for storing weighting coefficients for use by either an intra decoding operation or an inter decoding operation, by: storing weighting coefficients relating to one of the intra decoding operation or inter decoding operation in the at least one shared LUT (LUTSHARED); and deriving approximated weighting coefficients for the other of the intra decoding operation or inter decoding operation using an approximation function to modify the weighting coefficients stored in the at least one shared LUT (LUTSHARED, 907).

The filter 900 may be further operative to perform filtering operations as described here, and defined in the appended claims.

Figure 10 shows an example of a decoder 1000 that comprises a modifying means 1001 , for example a filter as described herein, configured to modify a pixel value by a weighted combination of the pixel value and at least one spatially neighboring pixel value, wherein weighting coefficients for use by the decoder for modifying pixel values are stored in one or more look up tables, LUTs. The decoder comprises 1000 comprises at least one shared LUT (LUTSHARED) 1003, for storing weighting coefficients for use by either an intra decoding operation or an inter decoding operation.

The decoder 1000 is operative to: store weighting coefficients relating to one of the intra decoding operation or inter decoding operation in the at least one shared LUT (LUTSHARED, 1003); and derive approximated weighting coefficients for the other of the intra decoding operation or inter decoding operation using an approximation function to modify the weighting coefficients stored in the at least one shared LUT (LUTSHARED, 1003).

The decoder 1000 may be further operative to perform a filtering method as described herein, and as defined in the appended claims.

It is noted that the at least one of the parameters σ_ά and o_r described above may also depend on at least one of: quantization parameter, quantization scaling matrix, transform width, transform height, picture width, picture height, a magnitude of a negative filter coefficient used as part of inter/intra prediction.

The embodiments described herein provide an improved filter for video decoding. It is noted that any of the embodiments relating to decoding may also be used for coding.

Figure 1 1 is a schematic block diagram of a video encoder 40 according to an embodiment. A current sample block, also referred to as pixel block or block of pixels, is predicted by performing a motion estimation by a motion estimator 50 from already encoded and reconstructed sample block(s) in the same picture and/or in reference picture(s). The result of the motion estimation is a motion vector in the case of inter prediction. The motion vector is utilized by a motion compensator 50 for outputting an inter prediction of the sample block.

An intra predictor 49 computes an intra prediction of the current sample block. The outputs from the motion estimator/compensator 50 and the intra predictor 49 are input in a selector 51 that either selects intra prediction or inter prediction for the current sample block. The output from the selector 51 is input 10 to an error calculator in the form of an adder 41 that also receives the sample values of the current sample

block. The adder 41 calculates and outputs a residual error as the difference in sample values between the sample block and its prediction, i.e., prediction block. The error is transformed in a transformer 42, such as by a discrete cosine transform (DCT), and the resulting coefficients are quantized 15 by a quantizer 43 followed by coding in an encoder 44, such as by an entropy encoder. In inter coding, also the estimated motion vector is brought to the encoder 44 for generating the coded representation of the current sample block.

The transformed and quantized residual error for the current sample block is also provided to an inverse quantizer 45 and inverse transformer 46 to reconstruct the residual error. This residual error is added by an adder 47 to the prediction output from the motion compensator 50 or the intra predictor 49 to create a reconstructed sample block that can be used as prediction block in the prediction and coding of other sample blocks. This reconstructed sample block is first processed by a device 100 for filtering of a picture according to the embodiments in order to suppress deringing artifacts. The modified, i.e., filtered, reconstructed sample block is then temporarily stored in a Decoded Picture Buffer (DPB) 48, where it is available to the intra predictor 49 and the motion estimator/compensator 50. The modified, i.e. filtered, reconstructed sample block from device 100 is also coupled directly to the intra predictor 49.

If the deringing filtering instead is applied following inverse transform, the device 100 is preferably instead arranged between the inverse transformer 46 and the adder 47.

An embodiment relates to a video decoder comprising a device for filtering of a picture according to the embodiments. Figure 12 is a schematic block diagram of a video decoder 60 comprising a device 100 for filtering of a picture according to the embodiments. The video decoder 60 comprises a decoder 61 , such as an entropy decoder, for decoding a bitstream comprising an encoded representation of a sample block to get a set of quantized and transformed coefficients. These coefficients are dequantized in an inverse quantizer 62 and inverse transformed by an inverse transformer 63 to get a decoded residual error.

The decoded residual error is added in an adder 64 to the sample prediction values of a prediction block. The prediction block is determined by a motion estimator/compensator 67 or intra predictor 66, depending on whether inter or intra prediction is performed. A selector 68 is thereby interconnected to the adder 64 and the motion estimator/compensator 67 and the intra predictor 66. The resulting decoded sample block output from the adder 64 is input to a device 100 for filtering of a picture or part of a picture in order to suppress and combat any ringing artifacts. The filtered sample block enters a DPB 65 and can be used as prediction block for subsequently decoded sample blocks. The DPB 65 is thereby connected to the motion estimator/compensator 67 to make the stored sample blocks available to the motion estimator/compensator 67. The output from the device 100 is preferably also input to the intra predictor 66 to be used as an unfiltered prediction block. The filtered sample block is furthermore output from the video decoder 60, such as output for display on a screen.

If the deringing filtering instead is applied following inverse transform, the device 100 is preferably instead arranged between the inverse transformer 63 and the adder 64.

One idea of embodiments of the present invention is to introduce a deringing filter into the Future Video Codec, i.e., the successor to HEVC.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the appended claims. The word "comprising" does not exclude the presence of elements or steps other than those listed in a claim, "a" or "an" does not exclude a plurality, and a single processor or other unit may fulfil the functions of several units recited in the claims. Any reference signs in the claims shall not be construed so as to limit their scope.

ANNEX A

Joint Video Exploration Team (JVET) Document: JVET-E0032 of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29 WG 11

5th Meeting: Geneva, CH, 12-20 January 2017

Title: Bilateral filter strength based on prediction mode

Status: Input Document to JVET

Purpose: Proposal Author(s) or Jacob Strom

Contact(s): Per Wennersten Tel: +46702672192

Kenneth Andersson jacob.strom@ericsson.com

Jack Enhorn

Farogatan 6

16480 Stockholm

Sweden

Source: Ericsson

Abstract

An updated version of the bilateral filter proposed in JVET-D0069 is presented. The filter strength is now lower for blocks using inter prediction, and it is reported that this results in additional BD rate decreases of 0% / -0.04% / -0.88% / -0.74% for Al / RA / LDB / LDP at no measurable added complexity. In comparison with JEM-4.0 the BD rate results are reported to be -0.42% / -0.46% / -0.52% / -0.54% with encoding time increase of 7% / 3% / 5% / 4% and decoding time increase of 5% / 3% / 5% / 3%. For screen content (class F) the BD rate improvements are -1 .84% / -1.27% / -1 .31 % / - 1 .62%.

1 Introduction

At the JVET meeting in Chengdu the use of bilateral filtering after the inverse transform was proposed in JVET-D0069 [1] and was also tested in EE2 on top of JEM-4.0 in

JVET-E0031 [2].

A bilateral filter works by basing the filter weights not only on the distance to

neighboring samples but also on their values. A sample located at (i, ), will be filtered using its neighboring sample k, I). The weight ω(ί,;, k, Q is the weight assigned for sample k, Ϊ) to filter the sample (i, ), and it is defined as

Here, I(i,j ) and I(k, Z) are the original reconstructed intensity value of samples

and (k, ΐ) respectively. σ_ά is the spatial parameter, and a_r is the range parameter. The properties (or strength) of the bilateral filter are controlled by these two parameters. Samples located closer to the sample to be filtered, and samples having smaller intensity difference to the sample to be filtered, will have a larger weight than samples further away and with larger intensity difference. In the original contribution JVET-D0069 it is proposed to set σ_ά based on the transform unit size (Eq. 2), and o_r to be set based on the QP used for the current block (Eq. 3) according to min(TU block width, TU block height)

σ_ά = 0.92 — (Eq. 2)

(QP - 17 \

a_r = max - , 0.01 . (Eq. 3)

The application of the bilateral filter after inverse transform can improve the objective coding efficiency for all intra and random access configuration. In this contribution we further improve the bilateral filter performance by reducing the filter strength for inter prediction blocks compared to intra prediction blocks.

2 Filter strength dependent of prediction mode

Inter predicted blocks typically have less residual than intra predicted blocks and therefore it makes sense to filter the reconstruction of inter predicted blocks less.

We therefore set the filter strength for intra predicted blocks as before but for inter predicted blocks we use the following spatial weight:

min(TU block width, TU block height)

o_d = 0.72 2_ _{Eq.₄₎

3 Results

Class E -0,56% 0,03% 0,10% 107% 106%

Overall -0,42% 0,1 1 % 0,10% 107% 105%

Class F (optional) -1 ,84% -0,28% -0,27% 104% 105%

Random access Main 10

Y U V EncT DecT

Class A1 -0,24% 0,01 % 0,05% 105% 102%

Class A2 -0,60% -0,36% -0,29% 103% 103%

Class B -0,45% 0,06% -0,13% 104% 104%

Class C -0,59% -0,13% -0,19% 102% 101 %

Class D -0,44% -0,23% 0,15% 104% 105%

Class E

Overall (Ref) -0,46% -0,12% -0,08% 103% 103%

Class F

-1 ,27% -0,24% -0,05% 104% 104% (optional)

Low delay B

Y U V EncT DecT

Class A1

Class A2

Class B -0,37% 0,34% 0,26% 108% 107%

Class C -0,55% 0,13% 0,09% 105% 106%

Class D -0,64% 0,43% 0,14% 104% 104%

Class E -0,58% 0,98% -0,57% 102% 103%

Overall (Ref) -0,52% 0,43% 0,03% 105% 105%

Class F

-1 ,31 % -0,04% 0,20% 105% 104% (optional)

Low delay P

Y U V EncT DecT

Class A1

Class A2

Class B -0,46% -0,09% -0,12% 105% 104% Class C -0,61 % -0,08% 0,12% 106% 106%

Class D -0,66% 0,19% 0,17% 100% 95%

Class E -0,41 % -0,31 % 0,58% 108% 108%

Overall (Ref) -0,54% -0,06% 0,14% 104% 103%

Class F

-1 ,62% 0,00% -0,48% 107% 106% (optional)

4 Conclusions

Due to the large improvements primarily seen for low delay settings, we suggest that this modified version of the bilateral filter is considered for inclusion instead of the original version proposed in JVET-D0069.

5 References

[1] J. Strom, P. Wennersten, Y. Wang, K. Andersson, J. Samuelsson, "Bilateral Filter After Inverse Transform", JVET-D0069, Chengdu, China, 15-21 October 2016

[2] J. Strom, P. Wennersten, K. Andersson, J. Enhorn, "EE2-JVET-D0069 Bilateral Filter Testl , Test2 and Test3", JVET-E0031 , Geneva, Switzerland, January 2017

6 Patent rights declaration(s)

Ericsson may have current or pending patent rights relating to the technology described in this contribution and, conditioned on reciprocity, is prepared to grant licenses under reasonable and non-discriminatory terms as necessary for implementation of the resulting ITU-T Recommendation | ISO/IEC International Standard (per box 2 of the ITU-T/ITU-R/ISO/IEC patent statement and licensing declaration form).

Claims

1 . A method, performed by a filter, for filtering of a picture of a video signal, wherein the picture comprises pixels, each pixel being associated with a pixel value, wherein a pixel value is modified by a weighted combination of the pixel value and at least one spatially neighboring pixel value, the method comprising:

storing weighting coefficients for use by the filter in modifying pixel values, the weighting coefficients being stored in one or more look up tables, LUTs;

sharing at least one LUT for storing weighting coefficients for use by either an intra decoding operation or an inter decoding operation, by:

storing weighting coefficients relating to one of the intra decoding operation or inter decoding operation in the at least one shared LUT; and

deriving approximated weighting coefficients for the other of the intra decoding operation or inter decoding operation using an approximation function to modify the weighting coefficients stored in the at least one shared LUT.

2. A method as claimed in claim 1 , wherein the approximation function comprises a scaling factor (s), for scaling an obtained weighting coefficient ω from the at least one shared LUT by the scaling factor (s).

3. A method as claimed in claim 2, wherein the scaling factor (s) comprises:

ωίη£χα(ί> j> k, I) where (ω_ίηίΓα) relates to intra weighting coefficients and (ω_ίηίεΓ) relates inter weighting coefficients, and σ_ά relates to a spatial parameter and o_r relates to a range parameter.

4. A method as claimed in claim 2 or 3, wherein the scaling factor (s) comprises a predetermined number of possible values.

5. A method as claimed in claim 4, wherein the weighting coefficients stored in the shared LUT relate to intra weighting coefficients (ω_ίηίΓα) for use with intra decoding operations.

6. A method as claimed in claim 5, wherein the scaling factor (s) comprises ST possible scaling values, and wherein an approximated inter weighting coefficient °>inter() _> k, I) assigned for a pixel k, I) to filter a pixel, weight is determined as: ωίηίετ ji k, I) = <yj_ntra(i, j, k, l)*s

, where the scaling factor (s) is derived from a scaling factor LUT, of size

7. A method as claimed in claim 2, wherein the scaling factor (s) is independent of transform unit, TU, size.

8. A method as claimed in claim 7, wherein the scaling factor (s) is a constant value.

9. A method as claimed in claim 1 , wherein the approximation function comprises a scaling factor (s) for scaling an obtained weighting coefficient ω from the at least one shared LUT by the scaling factor (s), and an offset value (p).

10. A method as claimed in claim 9, wherein the weighting coefficients stored in the shared LUT relate to intra weighting coefficients (ω_ίηίΓα) for use with intra decoding operations, and wherein an approximated inter weighting coefficient w_inter(i, j, k, I) assigned for a pixel (k, I) to filter a pixel, weight is determined as: <y_inter(i, j, k, l)= £o_intra(i, j, k, l)*s+p

1 1 . A method as claimed in claim 10, wherein the offset value (p) is an offset that depends on transform unit, TU, size.

12. A method as claimed in claim 10, wherein neither of the offset value (p) nor the scaling factor (s) depends on the transform unit, TU, size.

13. A method as claimed in any one of the preceding claims, wherein a weighting coefficient ω(ί,], k, Ϊ) assigned for a pixel k, Ϊ) to filter a pixel, weight is defined as:

u)(i,j, k, l) = e

(Eq.1)

, where σ_ά is a spatial parameter and wherein o_r is a range parameter, and wherein equation 1 is rewritten as first and second separate expressions: (i-k)² + (j-1)² ||I(i,j)-I(k,l)||²N

ω(ί, j, k, l) = e ^2σ _e 2o² J (Eq.13)

, and wherein a first LUT (LUTFIRST) is provided for storing weighting coefficients relating to the first expression of equation 13: (i-k)² + (j-l)² \

e ^2a J,

, and wherein a second LUT (LUTSECOND) is provided for storing weighting coefficients relating to the second expression of equation 13:

I ||I(i,j)-I(k,l)||²N

2^ar J,

, and wherein the second LUT (LUTSECOND) forms the shared LUT (LUTSHARED) for storing weighting coefficients relating to the second expression.

14. A method as claimed in claim 13, wherein the filter comprises a plus shape filter aperture, and whereby the second expression of equation 13,

( ||I(i,j)-I(k,l)||²N

^2ar ' , is represented by: the number of quantization parameters, QP, times the number of difference values lll(ij) - l(k,l)ll.

15. A method as claimed in any one of claims 13 to 14, wherein the filter comprises a plus shape filter aperture, and whereby the first expression of equation 13,

(i-k)² + (j-l)² \

ev ^2CTd / , is represented in a first LUT (LUTFIRST) by a reduced set of values.

16. A method as claimed in claim 15, wherein the reduced set of values are stored in a one-dimensional LUT, such that the first LUT (LUTFIRST) comprises a reduced LUT (LUTREDUCED) comprising T+1 values, wherein:

17. A method as claimed in any one of claims 13 to 16, wherein l(i, j ) and l(k, I) represent original intensity levels of pixels (i, j) and (k,l) respectively, which, after the weighting coefficients are obtained, are normalized to provide a final pixel value I_D (i,j), whereby I_D is the filtered intensity of pixel (i, j), and wherein the final pixel value I_D (i,j), is given by: ∑k,l'(k,Q* co(i,j,k,l) .__ .

, and wherein the method comprises only storing weights based on pixel difference values ||I(i, j) - I(k, l) || in a LUT for pixel differences which are below a threshold value (absmax).

18. A method as claimed in claim 17, wherein the LUT storing pixel difference values ||I(i, j) - I(k, l) || spans all absolute pixel difference values ||I(i, j) - I(k, l) || from zero to the threshold value (absmax), times the number of quantization parameters, QPs.

19. A method as claimed in any one of claims 2 to 18, wherein one or more further scaling factors are used to lower a filtering strength of the filtering operation.

20. A method as claimed in any one of the preceding claims, wherein at least one of the parameters σ_ά and o_r also depends on at least one of: quantization parameter, quantization scaling matrix, transform width, transform height, picture width, picture height, a magnitude of a negative filter coefficient used as part of inter/intra prediction.

21 . A filter, for filtering of a picture of a video signal, wherein the picture comprises pixels, each pixel being associated with a pixel value, the filter being configured to modify a pixel value by a weighted combination of the pixel value and at least one spatially neighboring pixel value, wherein the filter is operative to:

store weighting coefficients for use by the filter in modifying pixel values, the weighting coefficients being stored in one or more look up tables, LUTs;

share at least one LUT for storing weighting coefficients for use by either an intra decoding operation or an inter decoding operation, by:

22. A filter as claimed in claim 21 , wherein the filter is further configured to perform a method as defined in any one of claims 2 to 20.

23. A decoder comprising:

a modifying means configured to modify a pixel value by a weighted combination of the pixel value and at least one spatially neighboring pixel value, wherein weighting coefficients for use by the decoder for modifying pixel values are stored in one or more look up tables, LUTs; and

at least one shared LUT for storing weighting coefficients for use by either an intra decoding operation or an inter decoding operation:

wherein the decoder is operative to:

store weighting coefficients relating to one of the intra decoding operation or inter decoding operation in the at least one shared LUT; and derive approximated weighting coefficients for the other of the intra decoding operation or inter decoding operation using an approximation function to modify the weighting coefficients stored in the at least one shared LUT.

24. A decoder as claimed in claim 23, wherein the decoder is further operative to perform a method as claimed in any one of claims 2 to 20.

25. A computer program comprising instructions which, when executed on at least one processor, cause the at least one processor to carry out the method according to any one of claims 1 to 20.

26. A computer program product comprising a computer-readable medium with the computer program as claimed in claim 25.