GB2475739A

GB2475739A - Video decoding with error concealment dependent upon video scene change.

Info

Publication number: GB2475739A
Application number: GB0920926A
Authority: GB
Inventors: Reijo Siira
Original assignee: Nokia Inc
Current assignee: Nokia Inc
Priority date: 2009-11-30
Filing date: 2009-11-30
Publication date: 2011-06-01
Also published as: GB0920926D0

Abstract

A method and apparatus for checking a change in scene in a video sequence comprising: a data stream parser to detect the absence encoded video data frames (comprising an encoded video sequence) in a bitstream, 603; a video frame detector to determine whether the absent frames are associated with a change of video scene, 605; and an error concealment generator to select a type of error concealment process (spatial or temporal) dependant on whether the absent frames are associated with the change in video scene, 607. The change of video scene is preferably determined by comparing a frame count distance between two intra coded video frames to a previous count distance. The data frames are preferably contained in a data structure such as a video object plane (VOP) and also grouped into real time transport protocol packets (RTP), wherein the packer header is a RTP time stamp or sequence number.

Description

I

A Scene Frame Checker The present invention relates to apparatus for coding and decoding and specifically but not only for coding and decoding of image and video signals.

A video codec comprises an encoder which transforms input video into a compressed representation suitable for storage and/or transmission and a decoder than can uncompress the compressed video representation back into a viewable form. Typically, the encoder discards some information in the original video sequence in order to represent the video in a more compact form, for example at a lower bit rate.

Typical video codecs, for example International Telegraphic Union -Technical Board (ITU-T) H.263 and hL264 coding standards, encode video information in two phases. In the first phase, pixei values in a certain picture area or "block" are predicted. These pixel values can be predicted, for example, by motion compensation mechanisms, which involve finding and indicating an area in one of the previously encoded video frames (or a later coded video frame) that corresponds closely to the block being coded. Additionally, pixel values can be predicted by spatial mechanisms which involve finding and indicating a spatia' region relationship.

The second phase is one of coding the error between the predicted block of pixels and the original block of pixels. This is typically accomplished by transforming the difference in pixel values using a specified transform. This transform is typicaliy a Discrete Cosine Transform (DCT) or a variant thereof. After transforming the difference, the transformed difference is quantized and entropy encoded.

The decoder reconstructs the output video by applying a prediction mechanism simUar to that used by the encoder n order to form a predicted representation of the pixel blocks (using the motion or spatial information created by the encoder and stored in the compressed representation of the image) and prediction error decoding (the inverse operation of the prediction error coding to recover the quantised prediction signal in the spatial domain).

After applying pixel prediction and error decoding processes the decoder combines the prediction and the prediction error signals (the pixel values) to form the output video frame.

In typical video codecs, the motion information is indicated by motion vectors associated with each motion compensated image block. Each of these motion vectors represents the displacement of the image block in the picture to be coded (in the encoder) or decoded (at the decoder) and the prediction source block in one of the previously coded or decoded images (or pictures). In order to represent motion vectors efficiently, motion vectors are typically coded differentially with respect to block specific predicted motion vector. In a typical video codec, the predicted motion vectors are created in a predefined way, for example by calculating the median of the encoded or decoded motion vectors of the adjacent blocks.

Typically video codecs may encode a video frame as one of two types. The first type may be known as intra frame coding whereby the frame may be encoded using prediction techniques which rely solely on pixel values within the frame such as spatial prediction. The second type may be known as inter frame coding which may exploit temporal correlations between successive video frames in addition to any spatial correlations within the frame. For example, an intra frame coded video frame may employ motion compensation in order to exploit temporal correlations between frames.

Video streams comprising a plurality of coded video frames may be sent over communication networks in the form of packets. These packets may be susceptible to d&ays and losses in the network which can result in either the loss of a packet or the packet being received in a corrupted state. The effect of lost or damaged packets may cause corruption in the decoded video frame. However if the lost or damaged packet corresponds to a video frame which is used as a reference frame for temporal prediction then the effect of decoding a corrupted video frame may be propagated over several successive video frames.

A video decoder may deploy error concealment techniques in order to compensate for the loss or corruption of received video packets. Spatial error concealment is one such technique which may be typically adopted by video codecs. This technique uses interpolation between error free pixels within the same video frame in order to conceal any areas of the image which may be damaged as a result of lost or corrupted packets. A further technique of error concealment which may be found in video codecs is temporal error concealment. This technique relies on interpolation of error free pixels between neighbouring video frames in order to conceal damaged areas of the video image.

However, for video decoders deploying error concealment algorithms the correct error concealment technique should be used in order to correct the damaged areas within the video frame. In other words the technique used should be commensurate with the characteristics of the video frame.

For example, temporal error concealment should be used in order to conceal damage within a video frame whose image has essentially remained static or slowly changing over the duration of a number of frames. However, if the damaged frame is at the point of a scene change in the video stream then better results may be achieved by using spatial error concealment.

This application proceeds from the consideration that in order to effectively conceal the effect of corrupted or lost packets in a particular decoded video frame the correct type of error concealment algorithm should be used.

Embodiments of the present invention aim to address the above problem.

There is provided according to a first aspect of the present invention a method comprising: detecting in a bitstream comprising a plurality of encoded video data frames the absence of at least one encoded video data frame, wherein the encoded video data frames comprise an encoded video sequence; determining whether the absent at least one encoded video data frame is associated with a change of video scene within the video sequence; and selecting a type of error concealment process dependant at least in part on whether the absent at least one encoded video data frame is associated with the change in video scene of the video sequence.

According to an embodiment of the invention selecting a type of error concealment process dependant at least in part on whether the absent at least one encoded video data frame is associated with the change in video scene of the video sequence may comprises: at least one of selecting a first type of error concealment process when the absent at least one encoded video data frame is not associated with a change in video scene of the video sequence; and selecting a second type of error conceaiment process when the absent at least one encoded video data frame is associated with a change in video scene of the video sequence.

Selecting a type of error concealment process dependant at least in part on whether the absent at least one encoded video data frame is associated with the change in video scene of the video sequence may comprise at least one of: selecting a first type of error concealment process when the absent at least one encoded video data frame is not associated with a change in video scene of the video sequence; and selecting a second type of error concealment process when the absent at least one encoded video data frame is associated with a change in video scene of the video sequence.

The first type of error concealment process may be a temporal error concealment process, and the second type of error concealment process may be a spatial error concealment process.

Determining whether any of the determined absent at least one encoded video data frame is associated with a change in video scene of the video sequence may further comprises: detecting a first encoded video data frame of a first video frame type; detecting a further encoded video data frame of the first video frame type; determining a frame count distance between the first encoded video data frame and the further encoded video data frame; comparing the frame count distance to a previous frame count distance; and determining whether the comparison indicates that a change in video scene of the video sequence has occurred.

The change in video scene of the video sequence may be indicated by the frame count distance being smaller than the previous frame count distance.

The frame count distance may be adjusted to compensate for the at least one encoded video data frame by adding the number of the at least one encoded video data frame to the frame count distance.

The encoded video data frame is preferably contained in a data structure, and wherein detecting an encoded video data frame of the first video frame type may comprise decoding at least one field in the data structure header.

The first video frame type is preferably an intra coded video frame.

The data structure is preferably a video object plane, and the at least one field in the data structure is preferably at least one of: video object plane start code; and video object plane coding type.

The plurality of encoded video data frames may be grouped into packets, and wherein each packet comprises a packet header, and wherein detecting the absent at east one encoded video data frame may comprise: reading from a first data packet a first value of the data packet header, the first data packet comprising a first encoded video data frame; reading from a second data packet a second value of the data packet header, the second data packet comprising a second encoded video data frame; calculating a difference value between the first value of the data header value of the first data packet and the second value of the data header packet of the second data packet; and determining the difference value is indicative of the absent at teast one encoded video data frame between the first encoded video data frame and the second encoded video data frame.

The packets may be real time transport protocol packets, wherein the payload of each real time transport protocol packets comprises at least one data structure containing the encoded video data frame, and wherein the data packet header is preferably at least one of: a real time transport protocol time stamp; and a real time transport protocol sequence number.

According to a second aspect of the present invention there is provided an apparatus comprising at least one processor and at Jeast one memory including computer code, the at least one memory and the computer code configured to with the at least one processor, cause the apparatus to at least perform: detecting in a bitstream comprising a plurality of encoded video data frames the absence of at least one encoded video data frame, wherein the encoded video data frames comprise an encoded video sequence; determining whether the absent at ieast one encoded video data frame is associated with a change of video scene within the video sequence; and selecting a type of error concealment process dependant at least in part on whether the absent at least one encoded video data frame s associated with the change in video scene of the video sequence.

According to an embodiment of the invention the apparatus caused to at least perform selecting a type of error concealment process dependant at least in part on whether the absent at least one encoded video data frame is associated with the change in video scene of the video sequence may be further caused to perform the selecting of at least one of: a first type of error concealment process when the absent at least one encoded video data frame is not associated with a change in video scene of the video sequence; and a second type of error concealment process when the absent at least one encoded video data frame is associated with a change in video scene of the video sequence.

The first type of error concealment process is preferably a temporal error concealment process, and the second type of error concealment process is preferably a spatial error concealment process.

The apparatus caused to at least perform determining whether any of the determined absent at least one encoded video data frame is associated with a change in video scene of the video sequence may be further caused to perform: detecting a first encoded video data frame of a first video frame type; detecting a further encoded video data frame of the first video frame type; determining a frame count distance between the first encoded video data frame and the further encoded video data frame; comparing the frame count distance to a previous frame count distance; and determining whether the comparison indicates that a change in video scene of the video sequence has occurred.

The change in video scene of the video sequence is preferably indicated by the frame count distance being smaller than the previous frame count distance.

The frame count distance is preferably adjusted to compensate for the at least one encoded video data frame by adding the number of the at least one encoded video data frame to the frame count distance.

The encoded video data frame is preferably contained in a data structure, and the apparatus caused to perform detecting an encoded video data frame of the first video frame type is further caused to perform decoding at least one field in the data structure header.

The first video frame type is preferably an intra coded video frame.

The plurality of encoded video data frames are preferably grouped into packets, wherein each packet may comprise a packet header, and wherein the apparatus caused to perform detecting the absent at least one encoded video data frame may be further caused to perform: reading from a first data packet a first value of the data packet header, the first data packet comprising a first encoded video data frame; reading from a second data packet a second value of the data packet header, the second data packet comprising a second encoded video data frame; calculating a difference value between the first value of the data header value of the first data packet and the second value of the data header packet of the second data packet; and determining the difference value is indicative of the absent at least one encoded video data frame between the first encoded video data frame and the second encoded video data frame.

The packets are preferably real time transport protocol packets, wherein the payload of each real time transport protocol packets may comprise at least one data structure containing the encoded video data frame, and wherein the data packet header may at least be one of: a real time transport protocol time stamp; and a real time transport protocol sequence number.

According to a third aspect of the present invention there is provided an apparatus comprising: a data stream parser configured to detect in a bitstream comprising a plurality of encoded video data frames the absence of at east one encoded video data frame, wherein the encoded video data frames comprise an encoded video seq uence; a video frame detector configured to determine whether the absent at least one encoded video data frame is associated with a change of video scene within the video sequence; and an error concealment generator configured to select a type of error concealment process dependant at least in part on whether the absent at least one encoded video data frame is associated with the change in video scene of the video sequence.

According to an embodiment of the invention the error concealment generator S configured to select a type of error concealment process dependant at least in part on whether the absent at least one encoded video data frame is associated with the change in video scene of the video sequence may be further configured to select at least one of: a first type of error concealment process when the absent at least one encoded video data frame is not associated with a change in video scene of the video sequence; and a second type of error concealment process when the absent at least one encoded video data frame is associated with a change in video scene of the video sequence.

The first type of error concealment process is preferably a temporal error concealment process, and the second type of error concealment process is preferably a spatial error conceaiment process.

The video frame detector configured to determine whether any of the determined absent at least one encoded video data frame is associated with a change in video scene of the video sequence may be further configured to: detect a first encoded video data frame of a first video frame type; detect a further encoded video data frame of the first video frame type; determine a frame count distance between the first encoded video data frame and the further encoded video data frame; compare the frame count distance to a previous frame count distance; and determine whether the comparison indicates that a change in video scene of the video sequence has occurred.

The encoded video data frame is preferably contained n a data structure, and the video frame detector configured to detect an encoded video data frame of the first video frame type may be further configured to decode at least one field in the data structure header.

The first video frame type is preferably an intra coded video frame.

The plurality of encoded video data frames are grouped into packets, each packet may comprise a packet header, and the data stream parser configured to detect the absent at least one encoded video data frame may be further configured to: read from a first data packet a first value of the data packet header, the first data packet comprising a first encoded video data frame; read from a second data packet a second value of the data packet header, the second data packet comprising a second encoded video data frame; calculate a difference value between the first va'ue of the data header value of the first data packet and the second value of the data header packet of the second data packet; and determine the difference value is indicative of the absent at least one encoded video data frame between the first encoded video data frame and the second encoded video data frame.

The packets are real time transport protocol packets, the payload of each real time transport protocol packets may comprise at least one data structure containing the encoded video data frame, and the data packet header is preferably at least one of: a real time transport protocol time stamp; and a real time transport protocol sequence number.

According to a fourth aspect of the present invention there is provided a computer program product in which a software code is stored in a computer readable medium, wherein said code realizes the following when being executed by a processor: detecting in a bitstream comprising a plurality of encoded video data frames the absence of at least one encoded video data frame, wherein the encoded video data frames comprise an encoded video sequence; determining whether the absent at east one encoded video data frame is associated with a change of video scene within the video sequence; and selecting a type of error concealment process dependant at least in part on whether the absent at least one encoded video data frame is associated with the change in video scene of the video sequence.

For better understanding of the present invention, reference will now be made by way of example to the accompanying drawings in which: Figure 1 shows schematically an electronic device empIoyng embodiments of the invention; Figure 2 shows schematically a user equipment suitable for employing embodiments of the invention; Figure 3 further shows schematically electronic devices employing embodiments of the invention connected using wireless and wired network connections; Figure 4 shows schematically a decoder system deploying an embodiment of the invention; * Figure 5 shows schematically a decoder deploying a embodiments of the invention; Figure 6 shows a flow diagram illustrating in further detail a part of the operation of an embodiment of the decoder as shown in Figure 5; Figure 7 shows schematically in further detail a part of the decoder deploying embodiments of the invention; Figure 8 shows schematically a representation of a sequence of video frames; Figure 9 shows a flow diagram illustrating in further detail a part of the operation of an embodiment of video scene change detector the decoder as shown in Figure 7; Figure 10 shows the effect of using tempora' based and spatial based error concealment on a video sequence which does not comprise a change of video scene; and Figure 11 shows the effect of using spatial based error and spatial based error concealment on a video sequence which comprises a change of video scene.

The following describes in further detail suitable apparatus and possible mechanisms for the provision of concealment of errors within a video decoder. In this regard reference is first made to Figures 1 and 2 which show a schematic block diagram of an exemplary apparatus or e!ectronic device 50, which may incorporate a decoder according to an embodiment of the invention and a possible physical representation of the same apparatus.

The electronic device 50 may for example be a mobile terminal or user equipment of a wireless communication system. However, it would be appreciated that embodiments of the invention may be implemented within any electronic device or apparatus which may require encoding and decoding or encoding or decoding video images.

The apparatus 50 may comprise a housing 30 for incorporating and protecting the device. The apparatus 50 further may comprise a display 32 in the form of a liquid crystal display. In other embodiments of the invention the display may be any suitable display technology suitable to display an image or video. The apparatus 50 may further comprise a keypad 34. In other embodiments of the invention any suitable data or user interface mechanism may be employed. For example the user interface may be implemented as a virtual keyboard or data entry system as part of a touch-sensitive display. The apparatus may comprise a microphone 36 or any suitable audio input which may be a digital or analogue signal input. The apparatus may further comprise an audio output device which in embodiments of the invention may be any one of: an earpiece 38, speaker, or an analogue audio or digital audio output connection. The apparatus 50 may also comprise a battery 40 (or in other embodiments of the invention the device may be powered by any suitable mobile energy device such as solar cell, fuel cell or clockwork generator).

The apparatus may further comprise an infrared port 42 for short range line of sight communication to other devices. In other embodiments the apparatus 50 may further comprise any suitable short range communication solution such as for example a Bluetooth wireless connection or a USB/firewire wired connection.

The apparatus 50 may comprise a controller 56 or processor for controlling the apparatus 50. The controller 56 may be connected to memory 58 which in embodiments of the invention may store both data in the form of image and audio data and/or may also store instructions for implementation on the controller 56. The controller 56 may further be connected to codec circuitry 54 suitable for carrying out coding and decoding of audio and/or video data or assisting in coding and decoding carried out by the controller 56.

The apparatus 50 may further comprise a card reader 48 and a smart card 46, for example a UICC and UICC reader for providing user information and being suitable for providing authentication information for authentication and authorization of the user at a network.

The apparatus 50 may comprise radio interface circuitry 52 connected to the controller and suitable for generating wireless communication signals for example for communication with a cellular communications network, a wireless communications system or a wireless local area network. The apparatus 50 further may comprise an antenna 44 connected to the radio interface circuitry 52 for transmitting and receiving radio frequency signals generated at the radio interface circuitry 52.

In some embodiments of the invention, the apparatus 50 comprises a camera capable of recording or detecting individual frames which are then passed to the codec 54 or controller for processing. In other embodiments of the invention, the apparatus may receive the video image data for processing from an adjacent device prior to transmission and/or storage. In other embodiments of the invention, the apparatus 50 may receive either wirelessly or by a wired connection the image for coding/decoding.

With respect to Figure 3, a system within which embodiments of the present invention can be utilised is shown. The system 10 comprises multiple communication devices which can communicate through one or more networks.

The system 10 may comprise any combination of wired or wireless networks including, but not Umited to a wireless cellular telephone network (such as a GSM, UMTS, CDMA network etc), a wireless local area network (WLAN) such as defined by any of the IEEE 8O2x standards, a Bluetooth personal area network, an Ethernet local area network, a token ring local area network, a wide area network, and the Internet.

The system 10 may include both wired and wireless communication devices or apparatus 50 suitable for implementing embodiments of the invention.

For example, the system shown in Figure 3 shows a mobfle telephone network 11 and a representation of the Internet 28. Connectivity to the internet 28 may include, but is not limited to, long range wireless connections, short range wireless connections, and various wired connections including, but not limited to, telephone lines, cable lines, power lines, and similar communication pathways.

The example communication devices show in the system 10 may include, but are not limited to, an electronic device or apparatus 50, a combination personal digital assistant (PDA) and mobile telephone 14, a PDA 16, an integrated messaging device (IMD) 18, a desktop computer 20, a notebook computer 22. The apparatus may be stationary or mobile when carried by an individual who is moving. The apparatus 50 may also be located n a mode of transport including, but not limited to, a car, a truck, a taxi, a bus, a train, a boat, an aeroplane, a bicycle, a motorcycle or any similar suitable mode of transport.

Some or further apparatus may send and receive calls and messages and communicate with service providers through a wireless connection 25 to a base station 24. The base station 24 may be connected to a network server 26 that alJows communication between the mobile telephone network 11 and the Internet 28. The system may include additiona) communication devices and communication devices of various types.

The communication devices may communicate using various transmission technologies including, but not iimited to, code division multiple access (CDMA), global systems for mobile communications (GSM), universaf mobile telecommunications system (UMTS), time divisional multiple access (TDMA), frequency division multiple access (FDMA), transmission control protocol-internet protocol (TCP-IP), short messaging service (SMS), muJtimedia messaging service (MMS), email, instant messaging service (IMS), Bluetooth, IEEE 802.11 and any similar wireless communication technology. A communications device involves in implementing various embodiments of the present invention may communicate using various media inc'uding, but not limited to, radio, infrared, laser, cable con nections, and any suitable connection.

Embodiments of the invention are now described in more detail with respect to Figures4toll.

The general operation of video decoders as employed by embodiments of the invention is shown in Figure 4. A general decoding system 402 is illustrated schematically in Figure 4. The system 402 may comprise a storage or media channel (also known as a communication channel) 406 and a decoder 408.

The decoder 408 decompresses the bitstream 412 and produces an output video signal 414. The bit rate of the bit stream 412 and the quality of the output video signal 414 in relation to the input signal 410 are the main features which define the performance of the coding system 402.

Figure 5 shows schematicaUy a decoder 408 suitable for carrying out embodiments of the invention. Furthermore, with respect to Figure 6, the operation of the decoder exemplifying embodiments of the invention specifically with respect to the selection of the error concealment coding process is shown in detail.

The decoder 408 comprises an input 502 from which the encoded stream 412 may be received via the media channel 406. The input 502 may be connected to a network interface unit 501. The network interface unit 501 may be configured to receive encoded data from a media or communication channel, whereby the received data may be stored and unpacked. The output from the network interface unit 301 may be connected to the decoding unit 503.

In some embodiments the network interface unit 501 may be connected to the decoding unit 503 via at least two separate connections as depicted in Figure 5.

The first connection 521 may be configured to convey unpacked video data to the decoding unit 503, and the second connection 523 may be configured to carry any associated packet header information.

In further embodiments the network interface unit 501 may be connected to the decoding unit 503 via a single connection. This single connection may be configured to convey both unpacked video and any associated packet header data.

In such embodiments the associated packet header information may be embedded within the video data stream.

The decoding unit 503 may comprise three functional entities; a data stream parser 505, a video decoder 507, and an error concealment processor 511. The decoding unit may receive the encoded video data stream and accompanying packet header information at the decoding unit 503 via the data stream parser 505. The data stream parser 505 may provide separate connections to each of the video decoder 507 and the error concealment processor 511. The connection 527 from the data stream parser 505 to both the video decoder 507 and error concealment processor 511 may convey signalling data associated with the process of error concealment.

The connection 529 from the data stream parser 505 to the video decoder may convey video decoding parameters. The outputs from each of the video decoder 507 and the error concealment processor 511 may be connected to the output of the decoding unit 503.

It is to be understood that some of these entities may be implemented as a single functional block. For example, the video decoder 507 and error concealment processor 511 may share some functional processing elements and therefore in further embodiments they may be implemented as a single functional element.

Further it is to be understood that entities such as these may often share the same filter memories which may be utilised in order to assist in any transition from one entity to the next.

In a first embodiment the network interface unit 501 may be connected via the input 502 to a packet switched network. In such a network the encoded video signal may be transmitted in packet form using a packet structure according to the Real Time Transport Protocol (RTP) which may be encapsulated in the User Datagram Protocol (UDP), and further encapsulated by the Internet Protocol (IP), Contained within the RTP packet structure there maybe found the Time Stamp (TS) field, and

the sequence number field (SN).

The TS field may be configured to reflect the sampling instance of the first octet in the RTP data packet. In deployments of the RTP protocol the time stamp may be derived from a system clock which increments monotonically and linearly in time.

This information may then be used provide information on the temporal difference between RTP packets transmitted in the same RTP session, which may be used to restore the temporal order of received frames.

The SN field reflects the order in which the RTP packets are sent within an RTP session and therefore may be used by a receiver to detect packet loss and to restore the sequence order of received packets.

In some embodiments video coding frames may be encoded either as intra coded frames or inter coded frames. Intra coded frames may be encoded using prediction techniques which rely solely on pixel values within the frame such as spatial prediction. Whereas inter coded frames may exploit temporal correlations between successive video frames in addition to any spatial correlations within the frame. For exam pie, an inter coded video frame may employ motion compensation in order to exploit temporal correlations between frames. Therefore typically an intra coded frame may be encoded using more bits than an inter coded frame.

In some embodiments the number of bits required to represent an intra coded video frame may exceed the payload capacity of an RTP packet, whereas the number of bits associated with an inter coded frame may be within the payload capacity of a RTP packet. Consequently, bits representing the intra coded video frame may be distributed across a number of separate RTP packets.

For example, in a typicai operating scenario coding bits associated with an intra coded video frame may occupying up to two RTP packets, whereas coding bits associated with an inter coded video frames may occupy a single RTP packet.

In some embodiments the network interface unit 501 may, as part of the parsing process, unpack the payload data from the stream of received RTP packets and pass codec specific payload information to the decoding unit 503.

In some embodiments of the invention the network interface unit 501 may be used to reconstruct the timeline of the received RTP packets. This may be done by monitoring the TS fie'd found within the header of each received RTP packet. The netwOrk interface unit 501 may then determine if any RTP packets are missing from the received stream by monitoring for any breaks in the received timetine. It is to be understood that the length of the break in the received timeline may be proportiona! to the number of missing RTP packets and therefore the number of missing encoded video data frames.

In further embodiments of the invention the network interface unit 501 may also be configured to determine if there are any missing packets within the received data stream by monitoring the value of the SN field found in the received packets' RTP header. In this embodiment the number of missing RTP packets may be indicated by a discontinuity or break in the incremental ordering of the SN field. The number of missing packets may be determined by the difference between the SN field vaiue after the break, and the SN filed value before the break.

In other embodiments of the invention the network interface unit 501 may be implemented as a packet receiver whereby the parsing and missing packet detection functionality may be incorporated as part of the subsequent decoding unit 503.

The output from the network interface unit 501 may be used to convey the encoded video data retrieved from the RTP stream to the decoding unit 503; this may be achieved via connection 521. n addition the encoded video data may be accompanied by accompanying packet header information which may provide information indicative of any packets that may be missing. As described above this packet header information may be formed or derived from the TS and/or SN RTP

header fields.

In some embodiments the packet header information stream may be conveyed to the data stream parser 505 via a separate connection 523.

In other embodiments the packet header information stream may be embedded as part of the encoded video data stream which is in turn conveyed via connection 521.

In further embodiments the accompanying packet header information may contain the actual data reiating to the TS and/or SN RTP header fields. In these embodiments the recipient of the packet header information stream, the data stream parser 505 in the decoding unit 503, may perform the necessary processing in order to evaluate the co received encoded video data for missing frames.

The data stream parser 505 is configured to parse the video data frames and in some embodiments further determine the status of the frame.

The operation of parsing the encoded video data frame will hereafter be described in more detail by the processing steps depicted in Figure 6.

The decoding unit 503, and more specifically the data stream parser 505 is configured to receive the input unpacked encoded video data stream via connection 521 together with accompanying packet header information via con nection 523 via the network interface unit 501.

The process of receiving the unpacked encoded video data stream together with any accompanying packet header information is shown as step 601 in Figure 6.

Initially the data stream parser 505 may monitor the accompanying signalling stream in order to ascertain if any RTP packets and hence video frames are missing from the encoded video stream.

In embodiments this may be performed in conjunction with parsing the unpacked encoded video data stream.

Typically an encoded video data stream is partitioned into a number of object orientated data structures which facilitate the containment and transmission of encoded video data. These object orientated data structures may be arranged such that they allow encoded video data to be contained in a flexible manner, thereby enabling different types of encoded video data to be packaged in a standardised form.

In some embodiments the encoded video stream may be packaged according to the international standard ISO/lEO 14496-2 Coding of visual objects, whereby each encoded video frame may be contained in a data structure known as a video object plane (VOP). Therefore in such an embodiment the encoded video stream may be contained within plurality of VOPs. The start of the VOP data structure may in some embodiments be given by the table below.

rvideoObiectpaneo { No, of bits vop_start_code 32 vop_coding_type 2 where it can be seen that the first word in the VOP data structure VideoObjectPlane() is the vop_start_code. The vop startcode is a fixed 32 bit word which signifies the start of a new VOP. The next word is the vop_coding _type which signifies the type of video frame contained within the VOP. This word may take one of four values as denoted by the following table.

vopcoding_type coding method 00 intra-coded (I) 01 predictive-coded (P) bidirectionally-predjctjve-coded (B) 11 sprite (S) It can be seen from the above table that each value of the word corresponds to a different type of video encoded frame contained within the VOP.

The first value, which corresponds in some embodiments to vop_coding type "00", signifies that the video encoded frame contained within the VOP is of intra frame coding type and maybe referred to as an I frame. In other words the macroblocks comprising the video encoded frame may be encoded by exploiting the correlation among pixels from adjacent blocks within the same frame.

The second value, which corresponds in some embodiments to vop_coding_type "01", signifies that the video encoded frame contained within the VOP is of inter frame predictive coding type and may be referred to as a P frame. In other words the marcoblocks comprising the video encoded frame may be encoded by exploiting correlative behaviour between macroblocks of neighbouring frames. For instance a macroblock from a P frame may be encoded with the use of a motion compensation vector which is referenced to a corresponding macroblock from a neighbouring past reference frame.

The third value, which corresponds in some embodiments to vop_coding type "10", signifies that the video encoded frame contained within the VOP is of inter frame bi-predictive coding type and maybe referred to as a B frame. In a B frame the macroblocks comprising the video encoded frame may be encoded by expJoiting the correlative behaviour of at least two other corresponding marcroblocks from different neighbouring frames. Typically this may take the form of averaging motion compensation vectors from each of the two other macroblocks in order to achieve the predictive effect.

The fourth value, which corresponds in some embodiments to vop_coding_type "11", signifies that the video encoded frame contained within the VOP is of sprite type orS type.

In some embodiments the video stream may have been encoded according to the MPEG4 Simple profile as specified in the international standard ISO/IEC 14496-2 Coding of visual objects. Consequently any encoded video stream conforming to the MPEG4 Simple profile may comprise just I and P frames.

However, it is to be appreciated in further embodiments that the decoder 408 may be configured to decode encoded video streams which may conform to further operating profiles of the international standard ISOIIEC 14496-2. It is to be further appreciated in these embodiments that the encoded video stream may additionafly comprise further types of encoded video frame to the I and P frames mentioned in connection with the MPEG4 Simple profile above.

According to the Internet Engineering Task Force (IETF) standard RFC 3016 RTP Payload Format for MPEG-4 Visual/Audio Streams, RTP packets may be configured to have a payload of either a single video packet or multiple. video packets where each packet may be in the form of a VOP. Additionally, RTP packets may be configured such that video packets (or VOPs) may be fragmented across consecutive packets.

In some embodiments, particularly for wireless applications, the incoming RTP stream may be configured such that there is one VOP video packet for each RTP packet.

Figure 7, shows according to some embodiments a block diagram depicting in further detail components which may be present within the data stream parser 505.

In some embodiments the input signalling stream 523 to the data stream parser 505 may be connected to a missing packet detector 701 in Figure 7, and the input connection 521 which may convey the RTP payload data (in other words the unpacked video data) may be connected to the video scene change detector 703 also depicted in Figure 7.

Additionally the input connection 521 may a'so be connected to an Unpacker 707.

The Unpacker 707 may be arranged in some embodiments of the invention to unpack the input RTP payload data in order that the encoded video data may be decoded by the subsequent video decoder 507. The unpacked encoded video data may be conveyed to the video decoder along the connection 529.

The missing packet detector 701 may in some embodiments monftor the input signalling stream 523 in order to ascertain if any RTP packets and hence VOP encapsulated video frames are missing.

It is to be understood in some embodiments that the timestamp field in the RTP packet uniquely indicates the sampling instance of the VOP frame contained within the packet.

S

Thus in these embodiments the data stream parser 505 may be arranged such that any lost RTP and hence VOP encapsulated video frames may be detected by monitoring the time line of the received packets, as indicated by the sequence of

received timestamp fields.

As mentioned previously in some other embodiments the process of determining missing RTP packets may be performed by the network interface unit 501. In these embodiments the function of the missing packet detector 701 may be confined to just monitoring the input signalling stream 523 for indications from the network interface unit 501 that packets have been detected as missing.

The process of parsing network header information in order to determine missing video frames is depicted as the processing step 603 in Figure 6.

The video scene change detector 703 may in some embodiments be arranged to parse the RTP payload data, n other words the unpacked video data stream, for the start of a video object plane (VOP).

In some embodiments the start of a VOP may be determined by parsing the encoded video data stream for the header vop_start_code which signifies the start ofanewVOP.

Once the vopstart_code has been detected the scene change detector may then decode vop_coding_type in order to determine the frame type of the received video frame.

Typically a video coded sequence may be arranged as a pluraUty of groups with each group comprising a number of video frames.

In some embodiments each group of video frames may start wfth an Jntra coded (I) frame and then be followed by a sequence of inter predicatively coded frames such as the predicted P frame and the bi-directional predicted B frame.

In other embodiments each group of video frames may contain an Intra coded (I) frame near the beginning of the group. In these embodiments the start of the group is not necessarily denoted by the Intra coded (I) frame.

The number of frames or size of a group may vary according to whether inter prediction (or motion compensation) is effective across the frames which constitute the group. In other words the number of frames in a group may be determined by the source material at the encoder, If the encoder determines that there is little temporal correlation across subsequent video frames then a new group will be formed whereby a new Intra coded (I) frame will be generated. In this operating instance the groups may be shorter in terms of the number of frames. Alternatively, if the encoder determines that there is significant correlation between successive frames then the size of the groups may be longer and allowed to run to their maximum size.

In some instances a new group may start when there is a scene cut or a change of scene. Consequently in these instances, the start of a new group will result in an Intra coded (I) frame in the received sequence of video frames.

It is to be understood that for a video scene which remains constant over a period of time the number of frames in a group of frames may typicaily be determined by the allowable size of a group. This may have the resulting effect of having a reasonably constant number of frames in each group within the received video stream. Consequently the frame count interval between an Intra coded (I) frame from one group to an Intra coded (I) frame from the next group may also be reasonably constant over the course of the received video stream.

As stated above a scene change may result in a start of a new group of frames which in turn may result in a new Intra coded (I) frame within the sequence of frames of the received video stream. Typically, this Intra coded (I) frame will not be located at a position which is compatible with the previously established interval between consecutive Intra coded (I) frames. Rather, the new Intra coded (I) frame may have a position in the video stream which is somewhere in between the previously received Intra coded (I) frame position and that of a predicted position for a subsequent Intra coded (I) frame. In other words, the predicted position of a subsequent ntra coded (I) frame is the position associated with an Intra coded (I) frame if a scene change had not occurred.

The change in the position of the Intra coded (I) frame during a change in scene can be readily seen from Figure 8.

Figure 8 depicts an example of video stream a typical streaming video client might receive. In this particular example 801 represents a stream of received video frames where an Intra coded (I) frame may be represented by the letter I and a predicted frame may be represented by the letter P. It can be seen from 801 that that a regular interval of ten predicted P frames between each Intra coded (I) frame has been established.

Additionally, Figure 8 depicts the effect of a scene change on the distribution of Intra coded (I) frames within the video stream. This effect may be represented by the position of the Intra coded (I) frame 8011 where it can be seen that this frame results in a thscontinuity to the previously established regular interval pattern between consecutive I frames.

It is to be appreciated in some embodiments that the above described discontinuity or change in interval between consecutive Intra coded (I) frames may be used as an indication of a change in scene within the video stream.

It is to be understood therefore that in some embodiments the video scene change detector 703 may be arranged to monitor the pattern of received frame headers in order to detect a discontinuity in the regular interval pattern between consecutive Intra coded (I) frames.

In some embodiments this may entail that the video scene change detector 703 is arranged to decode the header within the RTP payload of each received video coding packet.

In a first group of embodiments the video scene change detector may decode the video coding type field associated with each VOP object header in order to determine the video coding frame type.

The video scene change detector 703 may repeat the process of. determining the video coding frame type for each received video packet.

In some embodiments the video scene change detector 703 may determine for each group of frames the frame count position of the Intra coded (I) frame within the group. Once the frame count position of an intra coded (I) frame within a group has been determined the video scene change detector 703 may then monitor the incoming video packet stream for the next Intra coded (I) frame. Once the location of the next Intra coded (I) frame has been detected the video scene change detector 703 may determine the distance according to the number of video frames between the two successive Intra coded (I) frames.

It is to be appreciated in embodiments that the distance between two successive Intra coded (I) frames is an indication of the size of the group.

The operation of the video scene change detector 703 may be described in more detail with reference to the flow chart in Figure 9.

In a first group of embodiments the video scene change detector 703 may initially detect an Intra coded (I) frame by decoding the vop_coding_type filed and determining that the value of the filed indicates an Intra coded (I) frame.

The step of detecting the next Intra coded (I) frame is depicted as processing step 901 in Figure 9.

The video scene detector 703 may then determine the distance between the next successive pair of Intra coded (I) frames from the video packet stream by decoding the video coding frame type header as described above.

It is to be understood that the next successive pair of Intra coded (I) frames may constitute the last previous Intra coded () frame and the next Intra coded (I) frame detected by the video scene change detector 703.

The step of determining the frame count distance between successive Intra coded (I) frame types is depicted as processing step 903 in Figure 9.

Upon determination of the frame count distance value the video scene detector 703 may compare the recently determined frame count distance value with that of the previously stored frame count distance value.

As a result of the above comparison the two frame count distance values may be found to be the same, thereby indicating that there is a regular interval value between successive groups of frames. In this instance the video scene change detector 703 may determine that there is no change of scene in the video stream.

The step of comparing the next frame count distance value against the previous stored frame count distance in order to determine if they are the same is shown as the decision step 905 in Figure 9.

If, as a result of the comparison step 905, it is determined that the current frame count distance d1 is the same as the previous stored frame count distance dp,.ev then it may be determined that there is a regular interval between successive Intra coded (I) frames and consequently the video scene change detector determines that there has been no change of video scene.

The process of determining that there has been no change in video scene is depicted by processing steps 907 and 909 in Figure 9.

From the previous comparison step 905 it may be determined that the current frame count distance d1 is less than the previous stored frame count distance dp,ei,* n this instance of operation of the video scene change detector 503 it is determined that the Intra coded (I) frame has arrived sooner than expected and consequently the video scene change detector 503 determines that there has been achange of video scene.

The process of determining that there has been a change in video scene is depicted by processing steps 911 and 913 in Figure 9.

Alternatively it may be determined from the previous comparison step 905 that the frame count distance d1 is greater than the previous stored frame count distance In this particular instance of operation it may be determined by the video scene change detector 703 that the Intra coded (I) frame has arrived later than expected and that this indicates that there may be a new frame count distance between successive Intra coded (I) frames. In this instance, the memory of previous frame count distances may be updated with the newly determined frame count distances.

However it is to be appreciated in the first group of embodiments that the memory of previous frame count distances may have the capacity to store more than one value for the previous frame count distance. This allows the video scene change detector 703 to revert back to an earlier stored value for the previous frame count distance should it be subsequently determined that the update in previous frame count was in error. For instance, in one operating scenario there may be an update to the previous frame count distance for the case when it is deemed that the interval between successive Intra coded frames has increased. However, following calculation of the next frame count interval distance it may be determined that the distance is the same as an earlier value for the previous frame count distance. In this instance it may be ascertained that the layer value for the previous fame count distance is erroneous and that its value should be reverted back to an earlier previously stored frame count distance. Further, the earlier previously stored frame count distance may preferably be associated with a frame count distance which has consistently maintained a steady value.

It is to be further appreciated that the above described mechanism may ensure that the video scene change detector 703 does not enter into a deadlock situation.

Whereby the supposed interval between consecutive Intra coded (I) frames is erroneously increased to such a level that all subsequent detected Intra coded (I) frames are deemed to have arrived sooner than expected and therefore are erroneously associated with a video scene change.

The process of determining that there has been no change in video scene and that there has been a change in the frame count distance between successive Intra coded (I) frames is depicted as processing steps 915 and 917 in Figure 9.

The process of updating the previous stored frame count memory is depicted as processing step 919 in Figure 9.

Further, in some embodiments the video scene change detector 703 may be arranged to receive an input from the missing packet detector 701.

The connection from the missing packet detector 701 to the video scene change detector 703 may be used to convey information relating to the number and position of lost frames within a group of frames should the missing packet detector 701 detect any lost packets. This is particularly appropriate for embodiments which deploy an RTP payload profile of one video coding frame per RTP packet, whereby the SN field in the RTP packet header may be used to determine which video frame is missing from the received encoded stream.

In some embodiments the information relating to the number and position of lost frames may be used by the video scene change detector to ensure that the correct number of frames is used when determining the frame count interval between successive intra coded (I) frames. For example, in such operating conditions the video scene change detector 703 may compensate for lost frames by inserting "dummy" or pseudo frames into the encoded sequence whilst determining the frame count interval between successive intra coded (I) frames.

In a further example, the video scene change detector 703 may compensate for lost frames by adding the number of lost frames to the current frame count distance d.

The output from the video scene change detector 703 may be a signal indicating whether there has been a change in the video scene for the most recent group of frames.

The method of monitoring the interval between successive video packet headers associated with Intra coded (I) frames in order to determine if the newly received frame corresponds to a scene change may be written as part of the C programming language code such as that shown below.

Initialize all variables needed for scene change detection: intra interval = 0; prey intra number = 0; prey intra interval = 0; intra was scene cut = FALSE; Call following method for every I-frame during playback. Parameter frame_number is number of current I-frame, where 1 is number of the first frame of stream.

bool check scene change intra (mt frame number) { mt intradist; bool scene cut = FALSE; intradist = frame number -prey intra number; if (prey intra number > 0 && intradist > 0) if (intradist == intra interval) /* Regular interval between intra frames --> timed intra.

Now two consecutive intra intervals have been the same, so use this as regular interval from now on and reset possible old backup interval value. */ prey intra interval = intradist; else if (intradist == prey intra interval) /* Intra frame is coming sooner than what we currently expect for regular intra interval. But interval is the same than previously used intra interval. Most probably reason is that one intra was disappeared during transmission and we erroneously increased intra interval too high. Change now back to original intra interval and consider current intra as timed intra. */ intra interval = iritradist; else if (intradist c intra interval) /* Intra frame is coming sooner than expected --> intra frame must contain scene change. *7 scene_cut = TRUE; else if (intradist > intra interval) /* Bigger intra interval than any interval so far. Use this value as new interval between timed intra frames. *1 prey intra interval = intra interval; intra interval = intra dist; 50} -else if (frame_number == 1) /* The first frame of stream: Use spatial error concealment instead of filling corrupted regions with some default color.

However if the first intra is not the first frame, temporal error concealment may be better, because corrupted region may be available as intra macroblocks from previous inter frames. */ scene cut = TRUE; / Use reference frame (=temporal) concealment otherwise */ prev_intra = frame number; intra was scene cut = scene cut; return scene cut; It is to be further appreciated that the above described method of decoding and monitoring the frame header associated with each received packet results in a system capable of detecting scene changes without the overhead of fuUy decoding each received video frame.

The process of detecting a change of scene in the video encoded stream is depicted as processing step 605 in Figure 6.

With reference to Figure 7 the output signal from the video scene change detector may be connected to the input of the error concealment signal generator 705.

The error concealment signal generator 705 may also be further arranged to receive an input from the missing packet detector 701. This input may be used to convey to the error concealment signal generator 705 information relating to whether the encoded stream has been received with missing encoded video frames. Additionally the information may contain details identifying which frame is missing as a result of lost RTP packets.

In some embodiments the error concealment signal generator 705 may combine the two inputs to produce the output error concealment signal 527. The output error concealment signal 527 may comprise information indicating whether error concealment is required in order to mask the effect of missing encoded frames.

Additionally the signal 527 may comprise further information indicating whether the group of frames in which the error concealment is to be performed is associated with a video scene change.

With reference to Figure 5, the input signal 527 to the error concealment processor 511 may be used to determine which type of error concealment algorithm is used.

Error concealment algorithms for video decoding can be categorised into two types; Spatial error concealment (or Intra frame error concealment), and Temporal error concealment (or Inter frame error concealment).

Spatial error concealment algorithms may comprise a process whereby pixels of an erroneous macro block are replaced by pixels from a neighbouring macro block within the same image frame. Typically the replacement strategy may involve some form of interpolation.

For example, erroneous pixels at the boundary of a macro block may be replaced by pixels from the boundary of a neighbouring macro block which are not in error.

However, erroneous pixels not at the boundary of a macro block may be substituted by a weighted average of neighbouring pixels not in error, whereby the weighting factor may be proportional to the inverse of the distance from the erroneous pixel.

Temporal error concealment algorithms may comprise a process whereby pixels from a previous frame are used to replace pixels from an erroneous macro block.

The replacement may either be by direct substitution or by using an estimated motion vector in order to compensate for the movement of the image from one image to the next.

It is to be appreciated that the choice error concealment algorithm may affect the overafl effect of the error concealed image frame. For example, if the video sequence does not contain a scene change, then it may preferable to use temporal error concealment rather than spatial error concealment in order to mask the errors in the corrupted image frame. This is because temporal error concealment is a more suited algorithm for exploiting the high level of correlation between macro blocks of consecutive image frames.

Conversely however, if the video sequence does contain a change of scene then it may be more preferable to use a spatial error concealment algorithm to mask the errors of the corrupted image frame. This is because in this instance the correlation within the frame may be higher than the correlation across multiple frames of the sequence.

Figure 10 depicts the effect different types of error concealment algorithm may have on an error concealed video frame in which the video frame is part of a sequence of video frames without a change of scene.

With reference to Figure 10; image 1001 portrays a decoded video frame whereby there are no errors in the encoded video stream, image 1002 portrays the decoded image frame whereby there are errors in the bottom half of the encoded video stream and temporal based error concealment has been used to mask these errors, and image 1003 portrays the same decoded video frame whereby spatial based error concealment has been used to mask the errors.

It is apparent from Figure 10, that spatial based error concealment may produce poorer results that temporal based error concealment in this instance as a result of the temporal based error concealment process exploiting the stronger inter frame correlative behaviour.

Figure 11 depicts the effect of the different types of error concealment algorithm has on an error concealed video frame in which the video frame is part of a sequence of video frames in which there is a change of scene. The change of scene may be represented as the transition from an image such as 1001 in Figure 10 to the image 1101 in Figure 11, with image 1101 being the image immediately after the change of scene.

With reference to Figure II; image 1101 portrays a decoded video frame in which there are no errors in the encoded video stream, image 1102 portrays the same decoded image frame whereby there are errors in the bottom half of the encoded video stream and temporal based error concealment has been used to make the errors, and image 1103 portrays the same decoded video frame whereby spatial based error concealment has been used to mask the errors.

It is apparent from Figure 11, that because of the change in scene within the video stream that in this instance spatial based error concealment is more suited than temporal based error concealment. This is because the correlation between macro blocks within the same image frame is stronger than the correlation between equivalent macro blocks across consecutive image frames.

The process of determining which type of error concea!ment according to whether there is a change in scene within the video encoded stream is depicted as processing step 607 in Figure 6.

It is to be understood therefore that in some embodiments that the signal 527 may comprise not only information indicating whether error concealment should be performed but also information indicating that in the case of error concealment then which type of error concealment algorithm be used.

The process of generating error concealment signal via the error concealment signal generator is depicted as processing step 609 in Figure 6.

With reference to Figure 5, it is to be understood that the data signal 527 may also be connected to the input to the video decoder 507 and used to indicate if the current video frame or series of frames require decoding. In other words, if the signal 527 indicates that the error concealment processor is to be used to produce the output video frame, then the video decoder may not be activated. If however, the signal connection 527 indicates that the video frame is good, then the video decoder may be activated to decode the encoded video stream conveyed on the signal connection 521.

In some embodiments the video decoder 507 may be any block based transform video decoder which is designed specifically to decode. the encoded video stream.

For example, the video decoder may conform to either the International Telegraphic Union -Technical Board ITU-T) H.263 and H.264 video coding standards, or the ISO/lEO Moving Picture Expert Group 4 Advanced Video Coding Standard.

Therefore in summary at least one embodiment of the invention comprises a method for video decoding comprising: detecting in a bitstream comprising a plurality of encoded video data frames the absence of at least one encoded video data frame, wherein the encoded video data frames comprise an encoded video sequence; determining whether the absent at least one encoded video data frame is associated with a change of video scene within the video sequence; and selecting a type of error concealment process dependant at least in part on whether the absent at least one encoded video data frame is associated with the change in video scene of the video sequence.

A further embodiment of the invention may comprise a method for video decoding comprising: detecting in a bitstream comprising a plurality of encoded data frames at least one non-contiguous encoded video data frame, wherein the encoded video data frames comprise an encoded video sequence; determining whether the at least one non-contiguous encoded video data frame is associated with a change of video scene within the video sequence; and selecting a type of error concealment process dependant at least in part on whether the at least one non-contiguous encoded video data frame is associate with the change in video scene of the video sequence.

The selecting a type of error concealment process dependant at least in part on whether the at least one non-contiguous encoded video data frame is associated with the change in video scene of the video sequence may comprise at least one of: selecting a first type of error concealment process when the at least one non-contiguous encoded video data frame is not associated with a change in video scene of the video sequence; and selecting a second type of error concealment process when the at least one non-contiguous encoded video data frame s associated with a change in video scene of the video sequence.

Further, the determining whether any of the determined at least one non-contiguous encoded video data frame is associated with a change in video scene of the video sequence may further comprise: detecting a first encoded video data frame of a first video frame type; detecting a further encoded video data frame of the first video frame type; determining a frame count distance between the first encoded video data frame and the further encoded video data frame; comparing the frame count distance to a previous frame count distance; and determining whether the comparison indicates that a change in video scene of the video sequence has occurred.

The frame count distance may be adjusted on order to compensate for the at least one encoded video data frame by adding the number of the at least one encoded video data frame to the frame count distance.

The encoded video data frame may be contained in a data structure, and detecting an encoded video data frame of the first video frame type may comprise decoding at least one filed in the data structure header.

The first video frame type may be an intra coded video frame.

The data structure may be a video object plane, and the at least one field in the data structure may at least be one of: a video object plane start code, and a video object plane coding type.

The pJuraity of encoded video data frames may be grouped into packets, and each packet may comprise a packet header, and detecting the at least one non-contiguous encoded video data frame comprises: reading from a first data packet a first value of the data packet header, the first data packet comprising a first encoded video data frame; reading from a second data packet a second value of the data packet header, the second data packet comprising a second encoded video data frame; calculating a difference value between the first value of the data header value of the first data packet and the second value of the data header packet of the second data packet; and determining the difference value is indicative of the at least one non-contiguous encoded video data frame.

Although the above examples describe embodiments of the invention operating within a codec within an electronic device, it would be appreciated that the invention as described below may be implemented as part of any video codec.

Thus, for exampie, embodiments of the invention may be implemented in a video codec which may implement video coding over fixed or wired communication paths.

Thus user equipment may comprise a video codec such as those described in embodiments of the invention above.

It shall be appreciated that the term user equipment is intended to cover any suitable type of wireless user equipment, such as mobile telephones, portable data processing devices or portable web browsers.

Furthermore eiements of a public and mobile network (PLMN) may also comprise video codecs as described above.

In general, the various embodiments of the invention may be implemented in hardware or special purpose circuits, software, logic or any combination thereof.

For example, some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto. While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-Jimting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controfler or other computing devices, or some combination thereof.

Therefore in summary at least one embodiment of the invention comprises an apparatus configured to: detect in a bitstream comprising a plurality of encoded video data frames the absence of at least one encoded video data frame, wherein the encoded video data frames comprise an encoded video sequence; determine whether the absent at least one encoded video data frame is associated with a change of video scene within the video sequence; and select a type of error concealment process dependant at least in part on whether the absent at least one encoded video data frame is associated with the change in video scene of the video sequence.

The embodiments of this invention may be implemented by computer software executable by a data processor of the mobile device, such as in the processor entity, or by hardware, or by a combination of software and hardware. Further in this regard it should be noted that any blocks of the logic flow as in the Figures may represent program steps, or interconnected logic circuits, blocks and functions, or a combination of program steps and logic circuits, blocks and functions.

The memory may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor-based memory devices, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory. The data processors may be of any type suitable to the local technical environment, and may include one or more of general purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs) and processors based on multi-core processor architecture, as non-limiting examples.

Therefore in summary at least one embodiment of the invention comprises a computer program product in which a software code is stored in a computer readable medium, wherein said code realizes the following when being executed by a processor: detecting in a bitstream comprising a plurality of encoded video data frames the absence of at least one encoded video data frame, wherein the encoded video data frames comprise an encoded video sequence; determining whether the absent at least one encoded video data frame is associated with a change of video scene within the video sequence; and selecting a type of error concealment process dependant at least in part on whether the absent at least one encoded video data frame is associated with the change in video scene of the video sequence.

Embodiments of the inventions may be practiced in various components such as integrated circuit modules. The design of integrated circuits is by and large a highly automated process. Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate.

Programs, such as those provided by Synopsys, Inc. of Mountain View, California and Cadence Design, of San Jose, California automatically route conductors and locate components on a semiconductor chip using well established rules of design as well as libraries of pre-stored design modules. Once the design for a semiconductor circuit has been completed, the resultant design, in a standardized electronic format (e.g., Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility or "fab" for fabrication.

The foregoing description has provided by way of exemplary and non-limiting examples a full and informative description of the exemplary embodiment of this invention. However, various modifications and adaptations may become apparent to those skilled in the relevant arts in view of the foregoing description, when read in conjunction with the accompanying drawings and the appended claims.

However, all such and similar modifications of the teachings of this invention will still fall within the scope of this invention as defined in the appended claims.

Claims

CLAIMS: 1. A method for video decoding comprising: detecting in a bitstream comprising a plurality of encoded video data frames the absence of at least one encoded video data frame, wherein the encoded video data frames comprise an encoded video sequence; determining whether the absent at least one encoded video data frame is associated with a change of video scene within the video sequence; and selecting a type of error concealment process dependant at east in part on whether the absent at least one encoded video data frame s associated with the change in video scene of the video sequence.
2. The method for video decoding as claimed in claim 1, wherein selecting a type of error concealment process dependant at least in part on whether the absent at least one encoded video data frame is associated with the change in video scene of the video sequence comprises at least one of: selecting a first type of error concealment process when the absent at Jeast one encoded video data frame is not associated with a change in video scene of the video sequence; and selecting a second type of error concealment process when the absent at least one encoded video data frame is associated with a change in video scene of the video sequence.
3. The method for video decoding as claimed in claim 2, wherein the first type of error concealment process is a temporal error concealment process, and wherein the second type of error concealment process is a spatial error concealment process.
4. The method for video decoding as claimed in any one of claims 1 to 3, wherein determining whether any of the determined absent at least one encoded video data frame is associated with a change in video scene of the video sequence further comprises: detecting a first encoded video data frame of a first video frame type; detecting a further encoded video data frame of the first video frame type; determining a frame count distance between the first encoded video data frame and the further encoded video data frame; comparing the frame count distance to a previous frame count distance; and determining whether the comparison indicates that a change in video scene of the video sequence has occurred.
5. The method for video decoding as claimed in claim 4, wherein the change in video scene of the video sequence is indicated by the frame count distance being smaller than the previous frame count distance.
6. The method for video decoding as claimed in claims 4 and 5, wherein the frame count distance is adjusted to compensate for the at least one encoded video data frame by adding the number of the at least one encoded video data frame to the frame count distance.
7. The method for video decoding as claimed in claims 4, 5 and 6, wherein the encoded video data frame is contained in a data structure, and wherein detecting an encoded video data frame of the first video frame type comprises: decoding at least one field in the data structure header.
8. The method for video decoding as claimed n claims 4 to 7, wherein the first video frame type is an intra coded video frame.
9. The method for video decoding as claimed in claims 7 and 8, wherein the data structure is a video object plane, and the at least one field in the data structure is at least one of: video object plane start code; and video object plane coding type.
10. The method for video decoding as claimed in claims I to 9, wherein the plurality of encoded video data frames are grouped into packets, and wherein each packet comprises a packet header, and wherein detecting the absent at least one encoded video data frame comprises: reading from a first data packet a first value of the data packet header, the first data packet comprising a first encoded video data frame; reading from a second data packet a second value of the data packet header, the second data packet comprising a second encoded video data frame; calculating a difference value between the first value of the data header value of the first data packet and the second value of the data header packet of the second data packet; and determining the difference value is indicative of the absent at least one encoded video data frame between the first encoded video data frame and the second encoded video data frame.
II. The method for video decoding as ciaimed in claim 10, wherein the packets are real time transport protocol packets, wherein the payload of each real time transport protocol packets comprises at least one data structure containing the encoded video data frame, and wherein the data packet header is at east one of: a real time transport protocol time stamp; and a real time transport protocol sequence number.
12. An apparatus comprising at least one processor and at least one memory including computer code, the at least one memory and the computer code configured to with the at least one processor, cause the apparatus to at least perform: detecting in a bitstream comprising a plurality of encoded video data frames the absence of at least one encoded video data frame, wherein the encoded video data frames comprise an encoded video sequence; determining whether the absent at least one encoded video data frame is associated with a change of video scene within the video sequence; and selecting a type of error concealment process dependant at least in part on whether the absent at least one encoded video data frame is associated with the change in video scene of the video sequence.
13. The apparatus as claimed in claim 12, wherein the apparatus caused to at least perform selecting a type of error concealment process dependant at least in part on whether the absent at least one encoded video data frame is associated with the change in video scene of the video sequence is further caused to perform the selecting of at least one of: a first type of error concealment process when the absent at least one encoded video data frame is not associated with a change in video scene of the video sequence; and a second type of error concealment process when the absent at least one encoded video data frame is assodated with a change in video scene of the video sequence.
14. The apparatus as claimed in claim 13, wherein the first type of error concealment process is a temporal error concealment process, and wherein the second type of error concealment process is a spatial error concealment process.
15. The apparatus as claimed in any one of claims 12 to 14, wherein the apparatus caused to at least perform determining whether any of the determined absent at least one encoded video data frame is assodated with a change in video scene of the video sequence is further caused to perform: detecting a first encoded video data frame of a first video frame type; detecting a further encoded video data frame of the first video frame type; determining a frame count distance between the first encoded video data frame and the further encoded video data frame; comparing the frame count distance to a previous frame count distance; and determining whether the comparison indicates that a change in video scene of the video sequence has occurred.
16. The apparatus as claimed in claim 15, wherein the change in video scene of the video sequence is indicated by the frame count distance being smaller than the previous frame count distance.
17. The apparatus as claimed in c'aims 15 and 16, wherein the frame count distance is adjusted to compensate for the at least one encoded video data frame by adding the number of the at least one encoded video data frame to the frame count distance.
18. The apparatus as claimed in claims 15, 16 and 17, wherein the encoded video data frame is contained n a data structure, and wherein the apparatus caused to perform detecting an encoded video data frame of the first video frame type is further caused to perform decoding at least one field in the data structure header.
19. The apparatus as claimed in claims 15 to 18, wherein the first video frame type is an intra coded video frame.
20. The apparatus as claimed in claims 18 and 19, wherein the data structure is a video object plane, and the at least one fieLd in the data structure is at least one of: video object plane start code; and video object plane coding type.
21. The apparatus as claimed in claims 12 to 20, wherein the plurality of encoded video data frames are grouped into packets, wherein each packet comprises a packet header, and wherein the apparatus caused to perform detecting the absent at least one encoded video data frame is further caused to perform: reading from a first data packet a first value of the data packet header, the first data packet comprising a first encoded video data frame; reading from a second data packet a second value of the data packet header, the second data packet comprising a second encoded video data frame; calculating a difference value between the first value of the data header value of the first data packet and the second value of the data header packet of the second data packet; and determining the difference value is indicative of the absent at least one encoded video data frame between the first encoded video data frame and the second encoded video data frame.
22. The apparatus as claimed in claim 21, wherein the packets are real time transport protocol packets, wherein the payload of each real time transport protocol packets comprises at least one data structure containing the encoded video data frame, and wherein the data packet header is at least one of: a real time transport protocol time stamp; and a real time transport protocol sequence number.
23. An apparatus comprising: a data stream parser configured to detect in a bitstream comprising a plurality of encoded video data frames the absence of at least one encoded video data frame, wherein the encoded video data frames comprise an encoded video seq uence; a video frame detector configured to determine whether the absent at least one encoded video data frame is associated with a change of video scene within the video sequence; and an error concealment generator configured to select a type of error concealment process dependant at least in part on whether the absent at least one encoded video data frame is associated with the change in video scene of the video sequence.
24. The apparatus as claimed in claim 23, wherein the error concealment generator configured to select a type of error concealment process dependant at least in part on whether the absent at least one encoded video data frame is associated with the change in video scene of the video sequence is further configured to select at least one of: a first type of error concealment process when the absent at least one encoded video data frame s not associated with a change in video scene of the video sequence; and a second type of error concealment process when the absent at ieast one encoded video data frame is associated with a change in video scene of the video sequence.
25. The apparatus as claimed in claim 24, wherein the first type of error concealment process is a temporal error concealment process, and wherein the second type of error concealment process is a spatial error concealment process.
26. The apparatus as claimed in any one of claims 23 to 25, wherein the video frame detector configured to determine whether any of the determined absent at least one encoded video data frame is assodated with a change in video scene of the video sequence is further configured to: detect a first encoded video data frame of a first video frame type; detect a further encoded video data frame of the first video frame type; determine a frame count distance between the first encoded video data frame and the further encoded video data frame; compare the frame count distance to a previous frame count distance; and determine whether the comparison indicates that a change in video scene of the video sequence has occurred.
27. The apparatus as claimed in claim 26, wherein the change in video scene of the video sequence is indicated by the frame count distance being smaller than the previous frame count distance.
28. The apparatus as claimed in claims 26 and 27, wherein the frame count distance is adjusted to compensate for the at least one encoded video data frame by adding the number of the at least one encoded video data frame to the frame count distance.
29. The apparatus as claimed in claims 26, 27 and 28, wherein the encoded video data frame is contained in a data structure, and wherein the video frame detector configured to detect an encoded video data frame of the first video frame type is further configured to decode at least one field in the data structure header.
30. The apparatus as claimed in claims 26 to 29, wherein the first video frame type is an intra coded video frame.
31. The apparatus as claimed in claims 29 and 30, wherein the data structure is a video object plane, and the at least one field in the data structure is at least one of: video object plane start code; and video object plane coding type.
32. The apparatus as claimed in claims 23 to 31, wherein the plurality of encoded video data frames are grouped into packets, wherein each packet comprises a packet header, and wherein the data stream parser configured to detect the absent at least one encoded video data frame is further configured to: read from a first data packet a first value of the data packet header, the first data packet comprising a first encoded video data frame; read from a second data packet a second value of the data packet header, the second data packet comprising a second encoded video data frame; calculate a difference value between the first value of the data header value of the first data packet and the second value of the data header packet of the second data packet; and determine the difference value is indicative of the absent at least one encoded video data frame between the first encoded video data frame and the second encoded video data frame.
33. The apparatus as claimed in claim 32, wherein the packets are real time transport protocot packets, wherein the payload of each real time transport protocol packets comprises at least one data structure containing the encoded video data frame, and wherein the data packet header is at least one of: a real time transport protocol time stamp; and a real time transport protocol sequence number.
34. A computer program product in which a software code is stored in a computer readable medium, wherein said code realizes the following when being executed by a processor: detecting in a bitstream comprising a plurality of encoded video data frames the absence of at least one encoded video data frame, wherein the encoded video data frames comprise an encoded video sequence; determining whether the absent at least one encoded video data frame is associated with a change of video scene within the video sequence; and selecting a type of error concealment process dependant at least in part on whether the absent at least one encoded video data frame is associated with the change in video scene of the video sequence.